Integrating the Collection into American Memory
The original materials in this collection are heterogeneous and the corresponding digital reproductions fall into different categories used by the Library of Congress to represent converted materials online:
Example: Oil painting
Example: Ship's plan
Example: Set of plans for a ship
Example: Letters written on a journey west aboard ship
Example: Collection of sailing cards
The G. W. Blunt Library at Mystic Seaport delivered image files in various formats (GIF, JPEG, MrSID), transcribed text as ASCII files, and catalog records in the MARC format to the Library of Congress. Each MARC record included a local field that indicated the item's unique identifier within the collection and its category. For use in American Memory, this information was transferred to $f and subfield $q of the 856 field. This enables a link from bibliographic dislays to the appropriate presentation for each digital reproductions. Local fields and subfields were added for consistency with other American Memory collections and to support cross-collection searching.
Transcribed text was delivered as ASCII files, one file per item as cataloged. Within the transcription, page-breaks had been indicated and the corresponding image file identified. For some works, a second file was delivered, representing a table of contents with each entry identifying the starting page-image. The Library of Congress took the transcriptions and tables of contents and marked them up automatically in SGML, according to the American Memory DTD. Documents for which tables of contents were available were divided into chapters for more effective searching. Special characters (such as the degree sign) were converted to SGML character entities. Positions (defined by latitude and longitude) and dates in diaries and logbooks had been encoded by Mystic Seaport in normalized form in addition to the form used in the original document. Positions were transformed for readability and dates were transformed to ISO 8601. As text strings, they can be searched for within the full text. The embedded references from the transcription files to page images were used to generate the datasets that support the Library of Congress's page-turning interface. The full text and the MARC records were indexed using Aurora (formerly InQuery), the search engine used for the American Memory service.