American MemoryThe National Digital Library Program: 
Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)


Disclaimer

The Relationship between URNs, Handles, and PURLs.
[August 12, 1997]

NOTE: Links to resources outside the Library of Congress are to URLs that were active when this set of archived documentation was actively maintained. Some links may no longer be active because resources have been removed. If a link is active, the resource may have changed substantially since the documentation was created. No attempt will be made to trace the linked resources or to suppress bad links. The URLs are being retained for their value as historical evidence.


What are URNs, handles, and PURLs?

Naming resources on the Internet poses a challenge. Uniform Resource Locators (URLs) are very powerful, but they are tied to a particular file in a particular computer and must be changed when a file system or a web-site is reorganized, which is often needed for performance reasons or when computer hardware must be changed, or when ownership of a resource transfers to a new organization. What is needed is a system of logical identifiers that are persistent, location-independent, and globally unique. The Internet/WWW community describes such an identifier as a Uniform Resource Name (URN), but URNs have not been widely deployed, not because a system is hard to conceive, but because it is hard to retrofit the Internet. The general idea is that you need a system that will resolve any URN to the currently valid URL (or future form of locator). A working group of the Internet Engineering Task Force has been working for some time to establish a framework for implementng URNs in a way that allows interoperation between different naming schemes and resolution mechanisms.

The handle system is one implementation for URNs, developed by the Corporation for National Research Initiatives (CNRI). The handle system is a distributed system designed to support global interoperability. Names are registered through a registry of "naming authorities" in an approach comparable to the allocation of ISBNs, for which publishers are given numeric identifiers to which they add unique suffixes for each publication. A naming authority can use ANY handle-server to register its names, and can move its registered names to another server without users being affected. For example, an institution could use CNRI's server initially, and then decide to run its own handle-server; the only change needed would be an entry in the naming authority register.

The design incorporates an expansible set of communicating handle-servers and a system of intermediate caches that reduce the load on the network by saving data about recently used (including frequently used) handles. The problem (for handles or any other comprehensive URN system) is that for full realization of the concept, the browsers on users' machines must be aware of URNs in general or handles in particular. The ideal way of effecting this Internet retrofit would require formal adoption by the Internet Engineering Task Force or the World Wide Web Consortium to recommend support for URNs by browser developers. This is not likely in the near future, because other issues have higher priority for the standards bodies.

The PURL system, developed by OCLC, creates Persistent URLs, "logical" addresses which are translated through a PURL Resolver into the URL of the current physical location. As explained in PURLs: Persistent Uniform Resource Locators by Stuart Weibel of OCLC, the PURL system is a short-term, partial solution to the naming problem, a solution that can be implemented now. The PURL system can be used with today's browsers but does not satisfy all the requirements for a URN system. A PURL is not fully location-independent. PURL Resolvers are independent systems. The PURL Resolver's location-dependent Internet address is part of the PURL. Information-providers registering resources on a particular PURL Resolver rely both on the continued existence of that service and that it will continue to provide adequate response time to all the "customers" for their resources. OCLC runs a PURL Resolver and encourages information-providers to use it. OCLC also makes its resolver software available to institutions who wish to run independent resolvers. For an institution committed to maintaining persistent identifiers for the resources they own and manage, running a PURL resolver may be a convenient solution. However, the owner of a resource cannot switch resolvers without changing its PURL or maintaining entries on both old and new servers.

Whatever form of URN or PURL is employed, owners, often publishers, of resources will have to take responsibility for maintaining their URNs. If a file must be moved or its URL changed, perhaps when a computer is replaced, or a company changes Internet domain name, the relevant record in the PURL resolver or handle server must be changed. However, this is much less of a burden than tracking down and modifying every online link to a resource or every copy of a distributed catalog record. Both the Handle system and the PURL system incorporate tools for updating records.

What do PURLs and handles look like?

Syntax and components of a PURL:

             
             http://purl.oclc.org/abcdefghijklm
             ----   ------------- -------------
              /           |             \
       protocol   resolver address     name  

The PURL incorporates the Internet address of a particular resolver, in this case the resolver run by OCLC. A PURL is a URL, recognizable by today's browsers, and any other software that interprets URLs.

PURL examples:

  1. http://purl.oclc.org/keith/home
    Today, this points to the URL http://www.oclc.org:5046/~shafer/
  2. http://purl.oclc.org/OCLC/OLUC/31817955/1
    Today, this points to the URL http://lcweb.loc.gov/catdir/semdigdocs/seminar.html which is the online proceedings of a 1994 seminar on cataloging digital documents. This LC document was given a PURL through the InterCAT project.
  3. http://purl.oclc.org/OCLC/OLUC/32821870/2
    points to the URL http://lcweb2.loc.gov/papr/mpixhome.html and finds the home page for the Early Motion Pictures Collection.
Try the PURLs out directly. Copy and paste the strings in bold into your web browser -- or retype them.

[Some people may wish to check out OCLC's InterCAT, which takes advantage of PURLs to avoid the need to change catalog records when cataloged resources are moved. Click on the "Logon" button from the main InterCAT page to begin your session. No password is needed. Search for "digital documents" or "early motion pictures" to find PURL examples 2 and 3.]

Syntax and components of a handle:

        URN:hdl:ABCDEFGHIJKLMNOP/abcdefghijklm
                ---------------- -------------
                        |              |
                naming authority      name  

The naming authority identifies an organizational entity that is entitled to allocate names because it is registered through a hierarchy of naming authorities. Any handle-server or handle-aware browser in the world has access to all naming authorities, and hence to all handles, in a way that is invisible to the user. The naming authority is not tied to a particular Internet address.

The long-term goal is to be able to use "URN:hdl:ABCDEFGHIJKLMNOP/abcdefghijklm" anywhere a URL might be used today. However, the hdl: or URN: schemes are not currently recognized by WWW browsers unless the browser is modified by installing a handle-resolver extension. For users with regular browsers, handles must currently be used through some sort of "proxy" client rather than directly by a browser. To support the use of handles as identifiers within the American Memory interface, the handle-client functions can be integrated into the routines that build displays. Proxy handle resolvers can also be run at known Internet addresses.

Proxy handle resolvers

To facilitate use of handles as persistent identifiers without widespread adoption of URNs, CNRI has developed proxy servers that act as a gateway between the HTTP protocol used by all web browsers and the handle resolution protocol. CNRI runs a proxy server at http://hdl.handle.net/ and the Library of Congress will run one at http://hdl.loc.gov/. Each proxy server can resolve any handle that corresponds to a URL.

Handle examples

  1. cnri.dlib/april96-c.arms
    which identifies an article in the April issue of D-Lib magazine, and currently points to the URL http://www.dlib.org/dlib/April96/loc/04c-arms.html.
    This handle is registered in the handle service, on the central global handle server at CNRI.

  2. loc.ndlp.amrlp/3a16116.1
    which identifies a photograph in one of the American Memory collections. The NDLP has decided that it will establish a hierarchy of naming authorities, delegating authority down to the level of a collection or similar aggregate under the custodial responsibility of an LC division or individual curator. This handle is also registered in the handle service, in a "local" handle server running at the Library of Congress for handles for American Memory and other LC projects.

In general, by prefixing any handle with "http://" and the Internet domain name of any proxy handle server, handles corresponding to URLs can be resolved using today's browsers. The proxy handle server thus allows such handles to be embedded in URLs that are persistent but not fully location independent. These "proxy URNs" are roughly equivalent to PURLs.

Thus,

How do handles and PURLs interoperate?

The handle client software has been incorporated into OCLC's PURL resolver software, so that PURL resolvers act as proxy handle servers. All handles corresponding to URLs can be resolved via any PURL resolver. This means that http://purl.oclc.org/hdl/cnri.dlib/april96-c.arms also retrieves the article.

In general, PURL resolvers will recognize names that begin with "/hdl/" as handles and will pass them on to the handle system for resolution.


Related reading from outside the Library of Congress: