3. Committee Discussion

Preservation-quality and access-quality images. At the first meeting of the Evaluation Committee, the Picture Elements consultants and the Library of Congress project leaders (Fleischhauer and Swora) sketched the project's key premises and solicited the responses of committee members. A distinction was made between preservation copies and preservation-quality images. The distinction hinges on the fact that the longevity of a digital image depends upon organizational, procedural, and financial commitments; to achieve full status as a preservation copy, a digital image must be kept alive for the long term.4 This project is dedicated to the development of specifications for images that, if longevity can be promised, will serve the goals of preservation, i.e., will serve as reasonable substitutes in the event that the original item is lost or deteriorates.

Foreseeing that high-quality digital images appropriate for preservation might be large and unwieldy over a computer network, the project's organizers also sought to develop specifications for access-quality images. Such images would be lower in either spatial resolution ("dots per inch") or tonal resolution ("bits per pixel") or both, and derived, if possible, from the preservation-quality images. Lower-resolution images--whose digital files should be smaller in extent (bytes)--can be more easily handled in computer systems. The project sought to identify images that, although less faithful to the original than preservation-quality images, offer high legibility and good service to researchers.

Nuances of the term preservation. The committee's response to the project outline included discussion of the word preservation. When original documents are retained and conserved in the manner of most Library of Congress manuscripts, some members asked, when are reformatted copies (whether microfilm or digital) for preservation rather than for access? The discussion did not provide a concrete answer but proceeded to raise two additional related questions:

Although there was no formal poll of the committee, the discussion of these two questions appeared to reach a certain consensus. If an original document is retained, and especially if it is not a "treasure" (1) the stakes are lower regarding the quality of the copy and (2) the copy must be judged (at least in part) in terms of the access service it provides, e.g., legibility. Some committee members distinguished between producing a copy of a retained manuscript document (lower stakes) and the reformatting of a brittle book destined to be discarded (higher stakes).

In response, other committee members argued that the group need not be so shy of the adjective preservation in this context. The typical microfilm of a manuscript collection does not offer a perfect facsimile of the original documents but is nevertheless called a preservation copy.

The committee pointed out that no discussion of a preservation reformatting could omit due consideration of conservation. Especially if the original item is to be retained, it must not be damaged in any way by the reformatting process. Conservation treatment may precede or follow scanning but ought to be part of any reformatting plan. Some members said that it would be better to suffer an inferior image than to injure the original.

The committee also noted that if a digital reformatting project provides good access-quality images (for general access) and extremely faithful preservation-quality images (for scholarly use and as a source for future access-quality images), it reduces the need for continued physical handling of the original. Thus one handling (at conversion time) can obviate a hundred handlings by patrons in ensuing years.

Document look and feel. The committee discussed the degree to which an image need replicate the look and feel of an original manuscript (or other archival) artifact. Although most members agreed that Library of Congress treasures, e.g., drafts of the Gettysburg address, warranted a kind of museum-quality facsimile, there was less consensus that such treatment was warranted for routine documents, especially the kind of twentieth-century typescripts that form the greatest portion of the Federal Theater Project collection.

One manuscript curator pointed to decades of successful use of microfilm by researchers, stating that "most historians seek the information in the document and are not passionate about the look and feel of the paper." Since most Library of Congress manuscript collections are conserved, researchers who need to see such elements as paper watermarks or the direction of fold lines can arrange to examine the originals. Prints and Photographs Division staff reported that their division had been very satisfied with the use of access-quality electronic images; their preservation copy is typically a large-format negative or color transparency.

The committee also noted that the Library does occasionally microfilm and discard a manuscript collection, e.g., a current project to reformat a large accumulation of unpublished copyright-deposit playscripts dating from the first half of the twentieth century. No special effort is being made to increase the quality of the microfilm in this instance. Further, some members said, the existence of one million or more pages of routine typescript would weigh heavily against making an extraordinary effort to produce a museum-quality digital facsimile of each page.

Binary and tonal digital images. Playing the role of devil's advocate, the Picture Elements consultant pointed out that one might interpret the argument against the need to produce near-perfect facsimiles as an indication that binary digital images would be satisfactory for preservation. In a binary image, only one bit per pixel is retained, representing either black or white. Such images are frequently used in office-automation imaging and tend to resemble the familiar (and well accepted) appearance of document photocopies. Did the committee, the consultant asked, view a binary image as an idealized form of the original paper document? There is some justification for this view: the thresholding operation that creates a binary image from a grayscale image attempts to sort foreground markings which are important (turning them full black) from extraneous, background information (turning it white).

High-quality tonal (grayscale or color) images, however, were favored in remarks made by several committee members and the consultants. Some committee members pointed out that it is not always clear what constitutes information in the case of a manuscript document. Penciled marginalia, stricken-out first drafts, and coffee stains are regular features of manuscript documents and, in some cases, are part of what invests them with historical value. Those features contain tonal information, or at least require that a copy image provide tonal distinctions in order to keep them perceptible in the image. In an elaboration of the argument in favor of tonal images, the consultant pointed out that it was not always the case that binary images could be more efficiently produced than high-quality tonal images.

Thresholding: the crucial aspect of image binarization. The consultant reported that despite decades of development, the most difficult aspect of binary imaging remains thresholding, the determination of the setting which determines whether a given "stroke" or mark on a document is translated into black (or not, and rendered as white, thus becoming indistinguishable from the background). Even the most advanced thresholding techniques, which use the edge information inherent in an image, still face a fundamental difficulty: in order to render the information as black and all other features of the document white, a judgment must be made as to what constitutes information for a given document. This is a difficult, but soluble, problem when a clear set of objective characteristics can be defined, e.g. all the red ink is information, but when a subjective element is introduced, where different researchers may be looking for different information, a general thresholding solution cannot exist.

The consultant cited an example from work carried out by Picture Elements in the field of bank check imaging. Personal checks often contain colorful pictorial scenes across the face of the document. To aid the legibility of the information--the payee, the amount, the signature--one might wish for the scenic background to drop out, i.e., to be turned white. Yet if a processing system does this, the customer may state that this image is not of one of their checks, for it has few of the familiar features they use to make that judgment. Thus the pictorial feature may be seen at different times and to different users as either information or as noise.

If the scanning equipment operator must make judgments at scanning time about which features of a manuscript are information, production will proceed slowly and results will be uneven, varying with the individual operator, who would have to be more skilled and thus more highly paid. The approach would be more prone to mistakes, which would result in costly re-scanning.

Another onerous and costly labor burden ensues from the use of binary images: the need for something approaching 100 percent inspection after scanning. If a low-contrast, but significant piece of information (such as a marginal note) is missing, it could render the image useless for some purposes. This form of image "failure" is the more worrisome in that the user of such an image may have no warning that information is missing. For the same reason, to do an adequate job, the visual inspection person would ideally compare each screen or printout image with the paper original, an ungainly and very expensive process.

Argument for tonal images. Regarding the efficient production of tonal images, Picture Elements argued that this image type preserves subtle shadings without requiring irreversible, skilled judgments at scanning time. Thus production can proceed in a cost-effective manner and with more uniform quality. Although grayscale and color imaging may require more of the hardware and software that processes the larger images, the reduced reliance on the application of operator judgment can reduce the cost of producing high-quality images.

There was some discussion of file size as a consideration, especially given that tonal images will be much larger than bitonal images. The consultants argued that the predominant cost in a conversion project is the labor, which runs in some multiple of $0.10 per page. Storage costs run currently in the range of multiples of $0.01 or $0.001 per page. Even if a compressed grayscale or color image were to be four times the size of a compressed binary image, its storage cost would be dwarfed by the cost of labor to scan and inspect it. The consultant also pointed out that, for the last twenty years, storage costs have been halved approximately every three years and there is every reason to expect this pattern to continue.

Some members of the committee, however, counter-argued that this analysis did not take into account the complexity and high cost of managing a server-based storage system through, say, two or three cycles of obsolescence, data migration, and backup. A fully realized storage system, they said, would be very costly.

The consultant noted that the production of tonal images would support a transition to future digital formats and standards. The arrival of new image types may warrant the migration of images captured today into new formats that are enhanced, compressed, or stored in different ways. If new images are to be produced from existing digital files (rather than by rescanning the originals), this will be more successfully carried from a high-quality image than from a limited-tone or low resolution image. Rescanning the originals is a very costly alternative that places documents at additional risk and may not always be an option.

Scanning microfilm. The consultant reviewed the argument advanced by some librarians that capture on microfilm will also permit scanning to meet future digital developments. In this model, the film represents the preservation copy and derived digital images offer online access. This approach carries an increased cost, the consultant said, due to the need to scan the film and--if the master negative is to be protected--the need to produce a copy of the master for scanning. These analog-image generations will result in a digital image that will be inferior to that produced by scanning the paper original.

Perfect reproduction not a sine qua non. This general discussion of preservation and preservation-quality images yielded some helpful principles for the deliberations that followed:

Topics not covered. The committee noted that at least two aspects of digital production were not included in this demonstration project. The curator of the Federal Theatre Project collection pointed out that the project's focus on separate-sheet items excluded the many bound items in the collection. Others on the committee expressed regret at the exclusion of bound materials, stating that the handling of such items presented special problems for custodians. The project planners responded that the handling of bound materials represented a complex problem of its own and this topic had been excluded in an effort to make the current project more manageable.

Additional discussion noted another exclusion: printed halftones. Such pictorial elements--typically the reproductions of photographs in books and periodicals--are ubiquitous in printed matter but less frequently encountered in manuscript collections. Printed halftones present their own thorny set of scanning complications and also represent a problem of their own, significant enough to require a separate project. In contrast, hand-drawn sketches are more common in manuscript holdings but since they typically involve the same marking devices (pens, pencils, charcoal, and the like) used for writing, these pictorial elements need not be separately discussed or studied as imaged elements.

Next Section | Previous Section | Contents