American MemoryThe National Digital Library Program: 
Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)


Also serves as appendix to Access Aids and Interoperability

Examples of mechanisms for keeping URLs relatively persistent. [August 18, 1997]

NOTE: Links to resources outside the Library of Congress are to URLs that were active when this set of archived documentation was actively maintained. Some links may no longer be active because resources have been removed. If a link is active, the resource may have changed substantially since the documentation was created. No attempt will be made to trace the linked resources or to suppress bad links. The URLs are being retained for their value as historical evidence.


The library community relies on a shared approach to cataloging. MARC records are distributed through bibliographic utilities and copied for use in Online Public Access Catalog (OPAC) systems in individual libraries or library consortia. Rather than catalog each item from scratch, libraries can take advantage of the intellectual effort of other catalogers. The MARC standard was extended a few years ago to allow URLs to be recorded in subfield $u of the 856 (Electronic Location and Access) field. In 1997, most library automation vendors have developed web-based interfaces to their catalog systems that can use these URLs to provide "hot" links to the actual resources. Third parties have developed Z39.50 client software with the same capabilities.

However, the fact that a URL usually points to a particular file on a particular computer means the link through a URL is liable to "break" when the corresponding file is moved or the system it is mounted on is re-organized. The 856 fields in copies of a MARC record that have been incorporated into catalogs across the world will no longer provide access to the resource they describe if the URL has changed. In the long run, systems that support persistent, location-independent identifiers are needed, but the development and deployment of such systems is in the early stages. Today, we need to be able to put URLs in distributed catalog records, and in other portable aids to access such as finding aids marked up according to the Encoded Archival Description (EAD) document type definition.

Within the limitations of today's URLs, there are several mechanisms that can be used to create identifiers that are relatively persistent. These mechanisms all rely on introducing layers of "indirection" between the URLs "published" in catalog records and the physical files that hold the content those records describe. The mechanisms are not alternatives; they can be used in combination. The list below should not be seen as exhaustive.

Take advantage of alias capabilities on your web-server

Consider using the alias capability of the Internet's domain name system to give a short and friendly name to the web-server that acts as the gateway to your repository. If you switch your "front door" to another computer, you simply change one entry in the Domain Name System (DNS). The capabilities of DNS ensure that this change will be recognized everywhere on the Internet.

The operating system of the computer that your web-server runs on (perhaps Unix or Windows NT) probably supports aliases for directories. This feature can be used to provide short names to important directories even if they are not located at the top of the actual file hierarchy.

To find out whether you can take advantage of these capabilities, contact your local systems administrator.

Use a "cgi-bin" script that acts as a query for a known item

Design your repository or database system in a way that you can retrieve a known item by its internal identifier. Prepare a script or program that accepts the internal identifier as a parameter, retrieves the item, and generates an appropriate display for web access. Instal this script or program on your web-server so that it can be invoked through the Common Gateway Interface (CGI) mechanism.

Examples:

Devise a logical naming scheme that guarantees uniqeness and is extensible

When devising a naming scheme, plan for the future. Is your scheme extensible to handle growth and additional projects or collections? Does it provide a mechanism for ensuring uniqueness? One-up numbers are certainly unique, but require that allocation of numbers is either managed through a single system that assigns each deposited item a new number or by pre-allocating ranges of numbers. Hierarchical structures for logical names can help ensure uniqueness by supporting independent identification for separate projects or collections.

For example, the Library of Congress uses a 2-level hierarchy to assign unique logical names to its digital reproductions. Each item is assigned to an aggregate (a grouping by custodial responsibility that often corresponds to an American Memory collection (e.g. Daguerreotypes), but may be a component of a larger collection (e.g. Programs and Playbills within Variety Stage). A central register of aggregate names is maintained. This allows each project to use an independent naming scheme that guarantees uniqueness within its aggregates. It also provides flexibility in the timing of names. Sometimes, existing identifiers for the original items can be used as identifiers for the digital reproductions. In other cases, numbers are assigned as materials are prepared for digitization.

Run an independent URL resolver or directory server

The PURL system, developed by OCLC, creates Persistent URLs, "logical" addresses which are translated through a PURL Resolver into the URL of the current physical location. A PURL resolver is a separately managed lookup system for mapping logical URLs into current physical URLs. It is independent of any particular repository or database and one resolver could be used to manage persistent identifiers for digital resources for an entire university or a library consortium. The PURL resolver at http://purl.oclc.org/ incorporates a database that maps each PURL to its corresponding physical URL and provides a method for changing the physical URL when the file is moved. OCLC makes its resolver software available freely. For an institution committed to maintaining persistent identifiers for a variety of resources that it owns and manages, running a PURL resolver may be appropriate.

Example:
When a MARC record was created at the Library of Congress for the online collection of Early Motion Pictures, OCLC copied the record into its InterCat system and assigned the PURL http://purl.oclc.org/OCLC/OLUC/32821870/2 to map to the existing URL in subfield $u of the 856 field. Currently, this PURL maps to http://lcweb2.loc.gov/ammem/papr/mpixhome.html.

Use a proxy handle resolver at a known Internet address as a URL resolver

World Wide Web browsers do not currently recognize handles or other URN schemes. Today, for users with regular browsers, handles must be used through a gateway between the HTTP protocol used by all web browsers and the handle resolution protocol. CNRI runs a such a gateway (known as a proxy handle server) at http://hdl.handle.net/ and the Library of Congress runs one at http://hdl.loc.gov/.

Examples:

The indirection provided by handles (or PURLs) can be combined with the script that uses an internal identifier as a parameter (variable). This will reduce the need to modify the entries in the handle-server or PURL resolver databases.

Examples:


Delve deeper for further discussion of:


Related reading from outside the Library of Congress: