Preparing documents for publication on the Web
Preparing documents for publication on the Web may involve transformation
from a word-processing document to HTML, writing HTML by a computer
program and many other steps. (This discussion does not cover
the transformations that take place as the Web server obtains
a file, sends it to the Web browser, which then displays it on
the user's desktop.) The transformation process that precedes
publication on the Web can include the following issues:
- Authorship: IRM delivers documents even when we are not the author. The
steps we follow to prepare a document for publication must cover
the situation where IRM is the author as well as the situation
where IRM is not the author. This can be tricky: when IRM is not
the author we have the same responsibility for quality but less
freedom to "correct" the document on the fly.
- Transporting: As documents are moved from one location (platform, form,
directory, etc.) to another they may be renamed and transformed
in several ways. It is important to make sure that the links and
labels which identify the logical content of the moved and transformed
documents change in an orderly and replicable way. Moving documents
may mean that additional information must be recorded.
- The information that is delivered on our Web server may at
one point or another exist on one or more platforms, including:
- external sources (e.g., CCHE, Information Associates, etc.)
- UMS mainframe
- our LAN
- an individual workstation
- the Web server
- Different sources: The information may be maintained and extracted
that have several different forms on any one of the platforms
that we use. Translation from one form to another is not necessarily
trivial. How to go about the translation process depends on whether
it will be done once or be done repeatedly. The forms and platforms
that we use include:
- Proprietary word processing formats (such as WordPerfect or
- Mainframe reports that contain data, labels, titles, and printer
- Data files that contain "just data" -- with no internal
documentation (e.g., that would give names or lengths or formats
for data elements);
- Proprietary data files (e.g., Oracle tables or SAS datasets);
- HTML files.
- Directories: During the process of moving and transforming the information
from one platform to another and one form to another, the information
may be stored in various different directories depending on the
- volume or degree of automation (handmade HTMLs are not intermixed
with machine-generated files);
- characteristics related to the information content,
including subject, author or responsible person, or degree of
- stage in the delivery process (all partly edited files are
kept separate from "ready to go" files);
- See Directory structure guidelines below
- Information size: During the transformation process, information is chopped
into smaller pieces or combined into larger ones. The size of
the final information chunks is a logical issue. The title of
an HTML page should describe the logical contents, an ungainly
title should tell you whether the chunk is too big or too small.
The physical size of a Web page is also a consideration: one chunk
should not take up too many screens. The size of the original
information source has an impact on all of the transformation
- Tables: Organizing information into tables of rows and columns is
often helpful and adds value from an information user's perspective
because tables are inherently compact and afford comparison between
adjoining values. However, just because the information that is
to be delivered on the Web is already in a table of some sort
doesn't mean that the table shouldn't be re-cast or re-thought.
Tables are not the only useful form for data delivery: de-normalizing
a table so that each row becomes a page of text may be a value-a odding
- Links: The anchors or links between HTML pages add value to the text.
Except for hand-made documents, these links are inserted during
the document transformation process.
- The "addlinks" utility automates this process.
- It finds occurrences of document titles in other documents
and inserts links as appropriate.
- It implements a philosophy of "link as much as possible"
(within a given directory or group of related directories).
- The current version of the program is slow, so that we do
not routinely run it on text that is frequently re-created.
- Linking to other Web sites (e.g., Boulder Campus Staff Personnel)
introduces a level of coordination and possible fragility that
should be avoided unless there are significant gains from the
- This approach is only used for HTMLs that are machine-generated
on our side.
- The index file that links local to remote HTMLs should be
as simple as possible. (Something like: "LOCAL_KEY:URL:ANYTHING_ELSE"
- Maintenance / synchronization issues need to be clearly established.
- Technical and logical "parity" issues go here.
- repeatedly re-created in such a way as to
- maintain the concept of "the same" document
- retain the useful internal documentation
- avoid vestigial pages
- Who is responsible for these various steps? How many hand-offs