The Library of Congress / Ameritech National Digital Library Competition (1996-1999) | |
Competition Home |
Technical Information for 1998/99 Competition > Guidelines and Resources Prepared for Applicants > Examples of Mechanisms for Keeping URLs Relatively Persistent |
NOTE: Links to resources outside the Library of Congress are to URLs that were active when this set of archived documentation was actively maintained. Some links may no longer be active because resources have been removed. If a link is active, the resource may have changed substantially since the documentation was created. No attempt will be made to trace the linked resources or to suppress bad links. The URLs are being retained for their value as historical evidence.
The library community relies on a shared approach to cataloging. MARC records are distributed through bibliographic utilities and copied for use in Online Public Access Catalog (OPAC) systems in individual libraries or library consortia. Rather than catalog each item from scratch, libraries can take advantage of the intellectual effort of other catalogers. The MARC standard was extended a few years ago to allow URLs to be recorded in subfield $u of the 856 (Electronic Location and Access) field. In 1997, most library automation vendors have developed web-based interfaces to their catalog systems that can use these URLs to provide "hot" links to the actual resources. Third parties have developed Z39.50 client software with the same capabilities.
However, the fact that a URL usually points to a particular file on a particular computer means the link through a URL is liable to "break" when the corresponding file is moved or the system it is mounted on is re-organized. The 856 fields in copies of a MARC record that have been incorporated into catalogs across the world will no longer provide access to the resource they describe if the URL has changed. In the long run, systems that support persistent, location-independent identifiers are needed, but the development and deployment of such systems is in the early stages. Today, we need to be able to put URLs in distributed catalog records, and in other portable aids to access such as finding aids marked up according to the Encoded Archival Description (EAD) document type definition.
Within the limitations of today's URLs, there are several mechanisms that can be used to create identifiers that are relatively persistent. These mechanisms all rely on introducing layers of "indirection" between the URLs "published" in catalog records and the physical files that hold the content those records describe. The mechanisms are not alternatives; they can be used in combination. The list below should not be seen as exhaustive.
Consider using the alias capability of the Internet's domain name system to give a short and friendly name to the web-server that acts as the gateway to your repository. If you switch your "front door" to another computer, you simply change one entry in the Domain Name System (DNS). The capabilities of DNS ensure that this change will be recognized everywhere on the Internet.
The operating system of the computer that your web-server runs on (perhaps Unix or Windows NT) probably supports aliases for directories. This feature can be used to provide short names to important directories even if they are not located at the top of the actual file hierarchy.
To find out whether you can take advantage of these capabilities, contact your local systems administrator.
Design your repository or database system in a way that you can retrieve a known item by its internal identifier. Prepare a script or program that accepts the internal identifier as a parameter, retrieves the item, and generates an appropriate display for web access. Instal this script or program on your web-server so that it can be invoked through the Common Gateway Interface (CGI) mechanism.
Examples:
The University of Michigan has provided a mechanism for users of their Making of America collection to retrieve one of their digital reproductions directly by publishing a "bookmarkable URL" as part of the bibliographic display. The bookmarkable URL for a book could be used in a MARC record for the item.
http://www.umdl.umich.edu/cgi-bin/moa/sgml/moa-idx?notisid=AEK2825 retrieves the book with internal identifier AEK2825.
The International Digital Electronic Academic Library (IDEAL) service from Academic Press supports persistent links to tables of contents and individual articles. The syntax for the cgi-bin URLs is described in Creating Links to IDEAL which includes information about a more technical document IDEAL 2.1 Symbolic References: A User's Guide.
Among other options, IDEAL provides scripts that retrieve
articles by ISSN/Vol/Initial Page
(e.g.
http://www.idealibrary.com/cgi-bin/links/citation/0890-5401/128/48)
or by SICI
(e.g.
http://www.idealibrary.com/cgi-bin/links/sici/0890-5401(19960710)128:1<48:TDUOSN>2.0.CO;2-?)
When devising a naming scheme, plan for the future. Is your scheme extensible to handle growth and additional projects or collections? Does it provide a mechanism for ensuring uniqueness? One-up numbers are certainly unique, but require that allocation of numbers is either managed through a single system that assigns each deposited item a new number or by pre-allocating ranges of numbers. Hierarchical structures for logical names can help ensure uniqueness by supporting independent identification for separate projects or collections.
For example, the Library of Congress uses a 2-level hierarchy to assign unique logical names to its digital reproductions. Each item is assigned to an aggregate (a grouping by custodial responsibility that often corresponds to an American Memory collection (e.g. Daguerreotypes), but may be a component of a larger collection (e.g. Programs and Playbills within Variety Stage). A central register of aggregate names is maintained. This allows each project to use an independent naming scheme that guarantees uniqueness within its aggregates. It also provides flexibility in the timing of names. Sometimes, existing identifiers for the original items can be used as identifiers for the digital reproductions. In other cases, numbers are assigned as materials are prepared for digitization.
The PURL system, developed by OCLC, creates Persistent URLs, "logical" addresses which are translated through a PURL Resolver into the URL of the current physical location. A PURL resolver is a separately managed lookup system for mapping logical URLs into current physical URLs. It is independent of any particular repository or database and one resolver could be used to manage persistent identifiers for digital resources for an entire university or a library consortium. The PURL resolver at http://purl.oclc.org/ incorporates a database that maps each PURL to its corresponding physical URL and provides a method for changing the physical URL when the file is moved. OCLC makes its resolver software available freely. For an institution committed to maintaining persistent identifiers for a variety of resources that it owns and manages, running a PURL resolver may be appropriate.
World Wide Web browsers do not currently recognize handles or other URN schemes. Today, for users with regular browsers, handles must be used through a gateway between the HTTP protocol used by all web browsers and the handle resolution protocol. CNRI runs a such a gateway (known as a proxy handle server) at http://hdl.handle.net/ and the Library of Congress runs one at http://hdl.loc.gov/.
Examples:
For the handle cnri.dlib/april96-c.arms,
http://hdl.handle.net/cnri.dlib/april96-c.arms
is resolved through the handle system to
http://www.dlib.org/dlib/April96/loc/04c-arms.html
For the handle loc.test/daghome,
http://hdl.loc.gov/loc.test/daghome
is resolved through the handle system to
http://lcweb2.loc.gov/ammem/daghtml/daghome.html
The indirection provided by handles (or PURLs) can be combined with the script that uses an internal identifier as a parameter (variable). This will reduce the need to modify the entries in the handle-server or PURL resolver databases.
Examples:
For the handle loc.music/musdi.101
http://hdl.loc.gov/loc.music/musdi.101
is resolved through the handle system to
http://memory.loc.gov/cgi-bin/query/r?ammem/musdibib:@field(NUMBER+@band(musdi+101))
For the handle loc.test/bss.105.HR00672
http://hdl.handle.net/loc.test/bss.105.HR00672
is resolved through the handle system to
http://thomas.loc.gov/cgi-bin/bdquery/z?d105:HR00672:|TOM:bss/d105query.html
Selected Topics from NDLP Internal Documentation, particularly the documents about Identifiers for Internet resources
CNRI's web pages for Handles and the Handle System.
[At http://www.handle.net/
].
OCLC's web pages for the PURL system.
[At
http://www.purl.org/].
Competition Home |
Technical Information for 1998/99 Competition > Guidelines and Resources Prepared for Applicants > Examples of Mechanisms for Keeping URLs Relatively Persistent |
The Library of Congress
>> American Memory
Content updated: 1997-08-18
|
Contact Us |