Extract/Loading Considerations

The Oracle loader should be used to load data, not to verify data quality. (The Oracle DBMS is somewhat more stringent in terms of data standards than the mainframe environment, so, taken as a whole, the process of getting data into Oracle involves a significant amount of data scrubbing that must be done before the data is presented to Oracle for loading.) The extract program should check for and assure data quality as defined in these guidelines.

The control of data integrity will place more importance on a complete set of rows than on a complete set of data values within a given row. Therefore extracts and loads will be designed so that rows are never dropped, even when they contain data which, for some unforseen reason, is not acceptable to Oracle or the extract/load process. To achieve this, an individual data value will be sacrificed if necessary. This presents a fairly difficult documentation problem when it occurs. If the value can be fixed so that it is acceptable to Oracle, the program that writes the .CSV file should do so (e.g., dates will be edited as discussed above or the enclosing character, if found in a character string, will be translated to a blank).

Both the extract and the load processes will provide adequate reports for error checking so that counts of the number of rows on both sides can be used as a quality assurance check.

.CSV files. The final step of the extract process is to write a comma separated values (.CSV) file on the mainframe that will be translated to ASCII using FTP's character translation conventions. Since FTP truncates* trailing blanks, when the last column in a table has the datatype CHAR, care must be taken to distinguish all blanks from null values (unless your FTPing a .CSV file and using the conventions as described below). The .CSV file will be named according to UMS data-set-naming conventions: source system, destination system, additional qualifiers, and '.CSV' for file type. For example:

FRS.WHSE.CURMONTH.TRANS.CSV.

The format for dates in the .CSV files is MMDDYYYY. Logic for handling century should be in the extract, not the load program. Character data, whether fixed length or variable length (CHAR and VARCHAR2 data types) must be enclosed with an enclosing character, which is currently a left curly brace ({). Quotes (double or single) don't work since they are sometimes part of the data that needs to be loaded. Character data should never contain the enclosing character ("{"). Illegal characters will be translated to blanks. NULL data is represented in the load file in two alternative ways: comma comma (,,) meaning the column value between the commas is missing, or comma enclosingchar enclosingchar comma (,{{,) meaning that there is no value being enclosed.

The loader program will always assume that columns that are not found in the .CSV file will be shown as null (e.g. the TRAILING NULLCOLS option will be turned on).

*When ftping a file from the mainframe, in order to prevent truncating trailing blanks (at the end of a record), the following (mainframe) FTP option should be used - 'QUOTE SITE PAD(Z)'.