American MemoryThe National Digital Library Program: Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)

Lessons Learned: Formats and Specifications for Digital Reproductions

Several of the awardee institutions discovered that once they started to work with their particular content that they wished to change their initial plans for formats, resolutions, and conversion processes.

Ohio Historical Society

Online Collection: The African-American Experience in Ohio, 1850-1920

This project involves scanning text and images from a variety of sources, including newspapers, manuscript materials, printed materials and photographs. OHS is scanning the material in-house. In the first interim report, George Parkinson noted that:

  1. The software bundled with the scanner was inadequate for our needs. Consequently, Adobe Photoshop was selected since it provides the necessary filters to blur and sharpen images and works in conjunction with the DeBabelizer batch processor OHS uses. "These programs have helped with refining and cleaning up the quality of the images. They have enabled us to smooth and sand images, and fill in broken characters on printed and manuscript material. Another benefit is that DeBabelizer allows for the inclusion of metadata (information about author, title, date scanned, project title and item origin) for each item."

  2. "The original grant indicated that we would provide a thumbnail GIF and high quality TIFF for each image. In scanning photographs, we discovered that photographs are much cleaner when delivered as JPEG images. Therefore when patrons select a photograph, they can choose to download the TIFF or a JPEG."

  3. "Another problem arose when scanning tables from serial publications. Information, particularily numbers, in tabular format scans very poorly and the Optical Character Recognition (OCR) program cannot recognize the characters. We determined that we would not OCR the pages with tables. We will provide a thumbnail image of each table and an option of downloading larger images. Instead of the text page, there will be text indicating that the table was not transcribed and that the patron should look at the image of the page. We will also include any title information found on the original table, to provide an idea of whether the item is of sufficient interest to the patron to search out the original item."

Duke University

Online Collection: Historic American Sheet Music

3,000 pieces of sheet music will be scanned from originals in-house on UMAX flatbed scanners. In the first interim report, Steven Hensen reported:

  1. The team has determined that for this material, "150dpi color scans fully represent all significant data and are more than adequate for current monitor and printer standards." They argue that, based on experience from previous projects, the added costs associated with higher resolution versions (for processing and storage) are not justified by the likely level of usage. Users may request higher resolution scans of individual items.

  2. "Images are saved as 150 dpi JPEGs with a high level of quality." 72dpi access copies and thumbnails are generated automatically using a locally developed combination of "the PERL language and freely available UNIX graphics software. This was a rewarding process which expanded the skills of the project manager and the Digital Scriptorium staff."

  3. "The biggest problem encountered has been the prevalence of moiré patterns caused by halftone dots in the page being scanned. Most frequently these appear on the illustrated title pages, but are frequently found elsewhere in pieces as well. A variety of techniques from the imaging software were devised to deal with the issue."

Brown University

Online Collection: African-American Sheet Music

Roughly 1,300 pieces of sheet music will be shipped to a contractor and scanned. The first report indicated that:

  1. The final choice of digital formats and resolutions was based on experimentation with several alternatives and discussions with technology service providers in the library and on campus as well as the scanning contractor and the Library of Congress. The choice made was to create four versions of each page:
    • Archival master, to be retained offline at Brown (300dpi LZW compressed TIFF);
    • Page-turner, to provide sufficient detail to be identifiable without requiring excessive load time or file size (400 pixel tall JPEG);
    • Printing version, to satisfy most of the needs of performers, musicologists, and cultural historians, as well as the general public (860 pixel tall JPEG);
    • High-resolution download version, to permit close-up inspection of an item (1650 pixel tall JPEG).

New York Public Library

Online Collection: Small Town America: Stereoscopic Views from the Dennis Collection, 1850-1910

Over 12,000 steroscopic views will be shipped to a contractor and scanned. The first report indicated that:

  1. A test of downloading files from workstations not on the NYPL network indicated that load times were too long and that file-sizes needed to be decreased. Greyscale images were more efficient to load than full color images and, "as those familiar with photography may have anticipated, are actually easier to view for image content at the thumbnail and examination size. The 'life-size' images that are in stereo and include the card mount will be in full color."

Lee Library at Brigham Young University with the Utah Academic Library Consortium and the Utah State Historical Society

Online Collection: Pioneer Trails: Overland to Utah and the Pacific, 1847-1869

This collection consists of 6,040 images of original handwritten diaries, printed guides, maps, photographs, and illustrative prints. In the interim report, Susan Fales notes:

  1. "After a period of scanning journals to a standard dpi resolution for the original archival TIFF images (400 dpi having been determined to be adequate), and ending up with a variety of file sizes depending on the size of the original journal, along with commensurately varied scan times, we decided that a better system would be to decide on a fixed number of pixels across the vertical side of a page (usually the long side) that would capture all the significant information in each journal. . . . After experimentation, 2500 pixels across the vertical dimension of the page was chosen as the standard. This was deemed to be sufficient to capture all the relevant textual information important to the user. . . . artifactual information not captured at this level was deemed to be accessible only by viewing the original."

  2. ". . . after some scanning had already taken place, it was felt that it would be preferable to have some manuscript identification built into each image, as a footer below the image. A system was developed in Photoshop to lay the image into a black background with the institutional identification information (University, Library, Department, Manuscript Call Number) in white lettering below each diary page image."