Column design

In general, there is a one-to-one correspondence between Oracle columns in the data warehouse and fields or data elements on the mainframe. Usually the fields in a mainframe file are transformed as they are moved to the data warehouse: most are re-named, some are re-ordered, some are dropped, and some are added. Examples of each case are provided below.

Column selection. Selecting which fields in a mainframe file to propagate to the data warehouse should be systematic and explicit; the metadata for each subject area should describe the logic and list those elements that are not transported to the data warehouse. The logic for this selection may be quite different in different areas. In general, unused or abandoned fields (i.e., in the case where the software vendor provided a field that is not used by the university) should not be transported to the data warehouse. In addition, mainframe files have numerous data elements that are used by mainframe applications to control processing but are not likely to be of any use or interest to data warehouse users. The process of transporting data from the mainframe to the data warehouse should be designed with the possibility in mind that data elements that are not used at one point in time may need to be put back in service on short notice. The need to support multiple warehouses suggests that in the extract and transporting process column selection, like table selection, should be dynamic.

Column names. Column names in the data warehouse are different from field names on the mainframe for several reasons. Each are designed with a different audience in mind: mainframe data element names are intended for use by programmers who are involved in development or maintenance of an application program, while column names in the data warehouse are designed for people who are using the data to analyze a business problem. Field names on the mainframe are for the most part given by the application package vendor; column names are determined as part of the data warehouse design process so that they make sense with other names in the data warehouse. The details are discussed in the "Warehouse Naming Guidelines."

An essential characteristic of column names in the data warehouse is that they all contain a classword as the last word in the column name (when ever possible), as discussed in the "Warehouse Naming Guidelines." That paper contains a list of approved classwords, which are chosen to describe the kind of data contained in the column. Using class words forces names to distinguish between a data element's meaning and its semantic form: for example, OBJECT_CODE and OBJECT_CODE_NAME both categorize the purpose of a financial transaction, but the class word CODE indicates that the column contains a symbolic representation of the purpose while the class word NAME indicates that the column contains a description in English.

Column order within a table. (This discussion talks about data elements as columns that are arranged in a sequence from left to right, rather than a list of data elements that is arranged vertically.) In a relational environment, the physical location of columns in a table matters much less than it does in the context of fixed-field files or printed reports. Nevertheless, as the user is selecting columns for a query, column order in a table or view affects what the user sees on the client side (although users can re-sequence columns by alphabetizing column names and reorder columns as they develop their own reports).

The physical order of fields on the mainframe side is not necessarily a good guide to the physical order of columns on the warehouse side.

Deriving warehouse columns from mainframe fields. The following examples serve to illustrate the different situations that occur in the derivation of warehouse columns from mainframe fields.