CIW Data Design Guidelines
Central Information Warehouse
26-Apr-2001 - IRM's 1990's design guidelines are discussed in the
series of links at the bottom of this page. Here's where we've
now progressed from those guidelines.
Guideline Revisions For All Subject Areas
- What we used to just call "the extract" we now consider to be
a three phase process: Extract, Transform, and Load. The Load
part is the Oracle loader.
- For CIW character data, missing data is represented by a single
blank, rather than a null value. So far we have not encountered a situation
requiring a distinction between missing and blank.
- A one byte character element should be defined as Oracle char(1). All
other character elements should be defined as varchar2(nn) where nn is the max
number of bytes expected in a value. nn may be set higher than the max
expected, to allow for possible growth.
- Generally, CIW character data is in uppercase. This facilitates
user searching and sorting - there is no question about the effects of
mixed case when everthing is in uppercase. A few exceptions have been made
for elements containing narrative text.
- The date value 01jan1901 may be used to indicate "the first entry we have
- The date value 09sep3999 may be used to indicate, in end dates or
expiration dates, "the last entry we have on record."
- A CIW date may be set to null if it is to be considered truely
missing, rather than one of the two common special date values discussed in
the previous two points.
- A CIW numeric amount may be set to null if it is truely missing, but may
also be set to zero if this does not distort the meaning of the data.
- Code number elements such as social security numbers are stored as
character with leading zeros. Exceptions may be made for code number
elements where users are not accustomed to leading zeros and it is known
that the source system enforces that the element must be numeric.
- Elements suffixed with EXPIRATION_DATE indicate the date upon which
the row of data is no longer effective, rather than the last date upon
which the row of data is still considered effective.
- In addition to the MMDDYYYY format, dates in the .csv load files may
optionally be DDMONYYYY or DDMONYYYY:HH:MM:SS where HH is hours in 24.
These correspond to sas date9. and datetime19. formats.
Guideline Revisions For GL and HRMS
- These subject areas are now designed using Dimensional Modeling methodolgy
(a.k.a. Star Schema) rather than Entity Relationship methodology.
Dimensional Modeling is advocated by
Ralph Kimball Associates. See
Is ER Modeling Hazardous
- Although Dimensionl Modeling calls for surrogate keys assigned in the
Transformation, we use production keys. This compromise was necessary because
one of the major user tools, MSAccess, is not good producing Oracle star
- If the time is a significant analysis dimension in a fact table,
we may assign it to a separate element in character HH:MM:SS form, and store
the date with the time part set to midnight. While this seems counter
intuitive since Oracle naturally stores dates as datetimes, it facilitates
independant dimensions on date and time. The CIW does not actually have
any date or time dimensions right now, but we may want them in the future.
Guidelines Version 3.1 (The 1990's Guidelines)
Information Resource Management
and University Management Systems