CIW Data Design Guidelines

Central Information Warehouse

26-Apr-2001 - IRM's 1990's design guidelines are discussed in the series of links at the bottom of this page. Here's where we've now progressed from those guidelines.

Guideline Revisions For All Subject Areas

  1. What we used to just call "the extract" we now consider to be a three phase process: Extract, Transform, and Load. The Load part is the Oracle loader.
  2. For CIW character data, missing data is represented by a single blank, rather than a null value. So far we have not encountered a situation requiring a distinction between missing and blank.
  3. A one byte character element should be defined as Oracle char(1). All other character elements should be defined as varchar2(nn) where nn is the max number of bytes expected in a value. nn may be set higher than the max expected, to allow for possible growth.
  4. Generally, CIW character data is in uppercase. This facilitates user searching and sorting - there is no question about the effects of mixed case when everthing is in uppercase. A few exceptions have been made for elements containing narrative text.
  5. The date value 01jan1901 may be used to indicate "the first entry we have on record."
  6. The date value 09sep3999 may be used to indicate, in end dates or expiration dates, "the last entry we have on record."
  7. A CIW date may be set to null if it is to be considered truely missing, rather than one of the two common special date values discussed in the previous two points.
  8. A CIW numeric amount may be set to null if it is truely missing, but may also be set to zero if this does not distort the meaning of the data.
  9. Code number elements such as social security numbers are stored as character with leading zeros. Exceptions may be made for code number elements where users are not accustomed to leading zeros and it is known that the source system enforces that the element must be numeric.
  10. Elements suffixed with EXPIRATION_DATE indicate the date upon which the row of data is no longer effective, rather than the last date upon which the row of data is still considered effective.
  11. In addition to the MMDDYYYY format, dates in the .csv load files may optionally be DDMONYYYY or DDMONYYYY:HH:MM:SS where HH is hours in 24. These correspond to sas date9. and datetime19. formats.

Guideline Revisions For GL and HRMS

  1. These subject areas are now designed using Dimensional Modeling methodolgy (a.k.a. Star Schema) rather than Entity Relationship methodology. Dimensional Modeling is advocated by Ralph Kimball Associates. See Is ER Modeling Hazardous to DSS?
  2. Although Dimensionl Modeling calls for surrogate keys assigned in the Transformation, we use production keys. This compromise was necessary because one of the major user tools, MSAccess, is not good producing Oracle star join SQL.
  3. If the time is a significant analysis dimension in a fact table, we may assign it to a separate element in character HH:MM:SS form, and store the date with the time part set to midnight. While this seems counter intuitive since Oracle naturally stores dates as datetimes, it facilitates independant dimensions on date and time. The CIW does not actually have any date or time dimensions right now, but we may want them in the future.

Guidelines Version 3.1 (The 1990's Guidelines)

Information Resource Management and University Management Systems