American MemoryThe National Digital Library Program: 
Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)

Turning pages within a digital reproduction [May 4, 1998]

For items that have been digitized as a sequence of bit-mapped pages, a program has been developed at the Library of Congress to present a "page-turning" interface compatible with most graphical web-browsers without the need for a special viewer. Many other institutions have developed similar functionality. In all cases, the "structural metadata" that represents the sequencing of the page-image files must be captured and held in a data structure. A program or script generates the web-pages dynamically from this data structure.

The Library's ampage program presents screen-sized GIF images within HTML pages that allow the user to jump to the next page, the previous page, or to any page in the sequence. For an example, take a look at a playbill for performance by Rentz Santley Novelty and Burlesque Company from the Variety Stage collection of playbills.

For each such item, a small "page-turning dataset" is created as a simple file with one line per page. The example below might be used for an 8-page piece of sheet music from the collection being contributed by Brown University as an LC/Ameritech awardee. It is assumed that the pieces of sheet music are identified sequentially (say, 1 to 1500).


"title", "aggregate", "item", "ref-image", "prntno", "ctrlno", "feature", "full-sized"
"Jemima's wedding day","rpbaasm","1234","123401.gif","1","1","Front Cover","123401.tif"
"Jemima's wedding day","rpbaasm","1234","123402.gif","2","2","","123402.tif"
"Jemima's wedding day","rpbaasm","1234","123403.gif","3","3","","123403.tif"
"Jemima's wedding day","rpbaasm","1234","123404.gif","4","4","","123404.tif"
"Jemima's wedding day","rpbaasm","1234","123405.gif","5","5","Advertisement","123405.tif"
"Jemima's wedding day","rpbaasm","1234","123406.gif","6","6","","123406.tif"
"Jemima's wedding day","rpbaasm","1234","123407.gif","7","7","","123407.tif"
"Jemima's wedding day","rpbaasm","1234","123408.gif","8","8","","123408.tif"

The top line is optional, but is included here because it helps illustrate the example. The dataset may seem to hold much redundant information, but this information is typically easy to generate automatically (or largely automatically) from a directory listing of files. One advantage of the repetitive form is that the model is easy to extend to more complex cases when the additional effort is justified.

The elements in the dataset are:

title
Title for the item and page. The title displayed at the top of the page-turning display is formed by taking the title for the collection (or aggregate) and adding the contents of this field. In this case, the same title is used for every page. In a longer document, the title might be specific to a chapter or section. Use of a special title is optional.
aggregate
At LC, digital reproductions are currently stored in "aggregates." Typically, an aggregate represents material of the same original format that belongs to the same American Memory collection. Many collections consist of a single aggregate. In this example, the rpbaasm aggregate will hold the entire collection of page-images of sheet music.
item
Within its aggregate, each item described in a bibliographic record (MARC or non-MARC) or in a finding aid must have a unique identifier. For items stored at LC, the combination of aggregate and item names form a unique identifier and provide the American Memory system with enough clues to retrieve the files listed in other columns in the table. [In the prototype repository under development for the Library, these identifiers will be implemented as URNs using the Handle Server System.]

At present the files are stored in the UNIX file system following a strict rule. For example, if the pieces of sheet music from Brown are numbered in sequence, all the images for item 1234 would be stored in the same directory, rpbaasm/1200/1234/. The directories for items 1200-1299 would all be in rpbaasm/1200/. The nesting of directories in this fashion keeps the number of files in any one directory to a manageable number. LC's experience indicates that 300 is a reasonable upper limit for the number or files in one directory.

ref-image
The name of a screen-sized (not more than 600 pixels wide) GIF file for convenient display on the web. To improve performance, these files should have the tonal resolution reduced below the GIF default of 8 bits per pixel whenever possible.
prntno
The page number printed on the corresponding page. This must be captured manually if used (perhaps at scanning time). Otherwise, prntno can be equal to ctrlno. This field appears in the "Turn to" box.
ctrlno
The number in the page sequence for this item, assumed to run from 1 to the number of pages. Used to control the "NEXT" and "PREV" links.
feature
Can be used to describe any special characteristics of a page, such as the cover, table of contents, or an index. On the page-turning display, the feature is displayed, when present, in place of "page n of N." This information must be entered manually.
full-sized
The name of a higher-resolution version of the image. This file might be a TIFF or JPEG image.

The file shown above would be called 1234.db and would be stored in the same directory as all the other files relating to item 1234. An automated process would be used to convert this dataset into 1234.data for more efficient processing. The URL http://memory.loc.gov/cgi-bin/ampage?collId=rpbaasm&fileName=1234/1234.data would be used to invoke the page-turning script for this item.

A link to the parent American Memory collection is based on information stored elsewhere for each aggregate. A link to related bibliographic information can be generated from the aggregate/item combination.

The dataset mechanism is less labor-intensive to create than hard-coded HTML pages and also more flexible. For example, it could be adjusted to allow for another column with thumbnail images, and the same file could generate an alternate view of the document as a grid of thumbnails with links to the larger images. In the prototype repository under development for NDLP, the information in the page-turning dataset will be held as structural metadata associated with the digital object representing the piece of sheet music.

.