American MemoryThe National Digital Library Program: Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)

Lessons Learned: Intellectual Access (and other types of metadata)

Several of the projects have taken advantage of existing machine-readable (often MARC) bibliographic descriptions. For other projects, descriptive records are being prepared in conjunction with the digitization. In most cases, relational databases (such as FileMaker Pro and Microsoft Access) are in use for data preparation and maintenance; metadata beyond that needed to provide intellectual access is usually recorded in the same database. Information to support the ongoing management of the resources (original and digital) is often included. Some institutions discovered a need to record unanticipated information relating to the structure of a converted item in order to support convenient access and navigation for users. In addition to exporting the data to LC for indexing within American Memory, several projects export the descriptive records for integration into other delivery systems at the host institution.

Decisions to change the level and sources of descriptive terminology were discussed in some reports.

Ohio Historical Society

Online Collection: The African-American Experience in Ohio, 1850-1920

At OHS, a database in Microsoft Access is being used both for data preparation and to support dynamic web-based presentation of items. Copies of the descriptive records will be sent to LC for integration into American Memory, but OHS is also providing a local search interface. In the first interim report, George Parkinson commented that the OHS pilot project led them to

  1. extend the initial set of fifty Library of Congress Subject Headings selected (and built in to the database for authority control) to provide more specific subject access.

  2. record additional information in the database, including the number of pages in a scanned item, and dates for individual items scanned from newspapers and serials.

In the second interim report, Parkinson identified two issues while scanning. The first relates to the occurance of inconsistancies in the numbering of issues and volumes of newspaper titles. While this does not necessarily cause problems in terms of searching or retrieving articles, it is problematic because the

  1. directory structure is based in part on volume and issue information; thus, duplicates cause confusion for project staff and create difficulties for storage.

  2. the numbering problems...confuse patrons [who would] encounter difficulties in looking for an article in volume 16, issue 20, but not find it because it was actually in the next issue, which was also volume 16, issue 20! However, there were also concerns about changing or alterng the original citation.

Since the date is generally considered much more reliable than the volume and issue information, LC staff and OHS decided:

  1. to keep the issues with duplicated numbers in a separate directory structure [while ensuring user access would be seamless].

  2. add a browse by date option, which should allow more accurate retrieval of issues. Patrons will be able to view a list of newspaper articles in date order for each newspaper. Every entry, which will hyperlink to the item, will give the original date date and volume/issue information. This presentation should make it clear to the patron that there are duplicate issues. We will also indicate what the "correct" issue number should be.

The second issue relates to the existence of blank pages in manuscript material, such as notes and minutes in ledger books. The question is whether it would be acceptable to scan only those pages with pertinent material, then indicate that ot her blank pages were not scanned. There are established standards and procedures for other types of reformatting projects. For example, when microfilming books, every page is microfilmed regardless of content or condition. There are no such standards for digital images yet. OHS and LC staff decided that there is no need to present (or scan) all the blank pages, but that some indication as to their existence is recommended.
University of Chicago

Online Collection: American Environmental Photographs, 1897-1931

Item-level descriptions are being prepared for 5,800 photographic images. Sources of information about the images include the card file that came to the library with the collection and a variety of notebooks, hand-written notes, and archival records.

In the first interim report, Alice Schreyer emphasized:

  1. In addition to the sources of subject authorities mentioned specifically in the proposal (LC Subject Headings, LC Thesaurus for Graphic Materials, and the Art and Architecture Thesaurus), the team decided to add two categories of botanical name (scientific and common), and terminology from the 1890-1920 period as found in the published work of Henry Chandler Cowles and John Merle Coulter. For terms describing physical geography, two published reference works were used.

  2. Up to 37 data fields are used to record descriptive and administrative data for items, including notes on damage revealed during the inspection and inventory process. Programs have been written to extract data from the database using the ODBC/SQL protocol (Open Data Base Connectivity /Structured Query Language) and tag it using SGML according to a simple document type definition (DTD). Export in another format is planned for use with software developed locally for use in the ARTFL project (American and French Research on the Treasury of the French Language)
University of Chicago

Online Collection: First American West

There were two major differences between the two collections for which the University of Chicago won awards. The first was a homogeneous collection of photographs. The second was a heterogeneous set of materials in different physical formats, selected from the collections of two institutions.

At the end of the second project, Alice Schreyer recommended:

  1. Examples of all possible types of materials to be digitized should be looked at early on and an assessment made of the challenges and needs of each format. For the FAW project there were many formats (pictures, books, manuscripts, maps, objects, etc.). Some required special cataloging attention for their digital versions, such as multivolume works. A clear understanding upfront of existing descriptive metadata that will be needed for selections and of the level of description that is desired for Internet access to digital versions would assist in planning a workflow for this part of the project.