Appendix C
Image Formats Survey Questionnaire


Library of Congress
Preservation Office

Digital Imaging for Manuscript Preservation
A Survey of the Field

14 December 1994

Editor's note (1998) The completed survey responses are available in the offices of the National Digital Library Program.

NOTE: Please return completed questionnaires directly to the contact person listed at the end of this message, not as a reply to the mailing list from which you received it. Replies are requested by 31 January 1995. Please note the offer of useful utilities (carrots!) down in the Instructions section.


The Library of Congress wishes to learn about existing practice among archives, libraries and the wider commercial marketplace for digital imaging of documents, especially the types of documents found in manuscript collections. This information, together with technical studies undertaken by the Library, will be used to develop approaches for future digital conversion efforts.

General Background

As the Library of Congress continues to develop its capabilities for providing computerized access to its collections, it must address a wide array of issues as well as identifying and testing a broad range of tools and techniques, especially those which will assure that digital imaging can be used successfully within the institution's preservation programs.

In order to address some of those preservation issues in the particular case of manuscript and document collections, the Library has engaged Picture Elements, Inc., to carry out one or more surveys in order to determine and/or identify:

This survey is a part of that effort. A later demonstration conversion project is also planned.

Detailed Background

The focus of this demonstration project will be on documents consisting of unbound, separate handwritten or typed sheets of 8.5 inch by 14 inch or smaller paper -- what might be considered to be typical manuscript documents.

A key issue for the Library is finding the most judicious balance between conserving precious original documents--protecting them from damage--and achieving a reasonably rapid rate of conversion. The outcomes of this project are expected to assist the Library in designing models for further conversion applications for the Library's collections.

The Library foresees the need for at least two types of images that reproduce typical manuscript-collection documents. One image, proposed for consideration as a potential digital preservation-quality image, will have high quality and offer a faithful copy of the original.

The Library also seeks to create smaller-sized images in addition to the preservation-quality image. These will be used in end-user retrieval systems, especially those accessed via computer networks, including Internet. Smaller-sized or access-quality images can be more easily handled in such systems. The Library would like to identify a practical level of quality that, although less faithful than the preservation-quality image, offers high legibility and good service to researchers.


Please complete this questionnaire if your organization has or is planning a project involving preservation of manuscript or other primarily textual documents using digital imaging.

It may be that you have no such project, but have opinions or policies on this topic. Or, you may find a questionnaire format confining. You may not have time to address the entire set of questions. In these cases, please feel free to provide any information with a form and content you feel appropriate, using the questionnaire as a guide to our issues of interest. Comments may be inserted in-line into the questionnaire or attached. When presented with a list, multiple answers will often be appropriate.

Please indicate your desire for further information on these utilities on your returned questionnaire or obtain them from the Picturel Elements web site at:

Please complete the General Questions section below. Then proceed to special questions for Archivists and special questions for Technologists. You need not answer every question; feel free to offer some replies in both sections.

In exchange for your returned questionnaire, we would like to offer you two useful public domain utilities for checking the format of image files. TIFFLOOK dumps TIFF files and JPEGINFO dumps JPEG Interchange Format (JPG) or JPEG File Interchange Format (JFIF) files. Please indicate your desire for further information on these utilities on your returned questionnaire or obtain them from the Picturel Elements web site at:

General Questions

1. Contact Information

Name _____________________________________
Organization _____________________________________
Address _____________________________________
Email address _____________________________________
Phone _____________________________________

2. Name of Project or Department


3. Nature of Your Organization

archive ___
library ___
commercial company ___
government agency ___
other (please specify) _____________________________________

4. Materials Being Digitized

4a. physical form
loose pages ___
bound volumes ___
other (please specify) _____________________________________

4b. content types
typewritten ___
handwritten ___
engravings ___
lithographs ___
other (please specify) _____________________________________

Archivist or Curatorial Questions

5. Do you create digital images of

manuscript papers ___
printed matter ___
handwritten items ___
other types of documents ___
(please specify)? _____________________________________

6. Do you consider these copies to be for

access ___
preservation surrogate ___
document delivery ___
republication ___
transcriptions ___
optical character recognition ___
a mix of the above ___
other (please specify)? _____________________________________

7. What is the total number of images scanned in your project to date?

8. If you create digital images with preservation as a goal, do you discard or retain the original paper item?

discard ___
retain ___
other (please specify)? _____________________________________

9. Regarding microfilming of the items being digitized, do you

microfilm in parallel ___
scan from microfilm ___
output digital image to an
electron beam film recorder ___
other (please specify)? _____________________________________

10. What are your approaches to retrieval?

catalog ___
non-bibliographic database ___
directory ___
register ___
SGML-tagged register ___
searchable full texts ___
other (please specify) _____________________________________

11. Do you use the image file header content for searching and retrieval?

12. Do you use any special approaches to protect or authenticate images?

encryption ___
authentication ___
watermarks ___
hidden watermarks ___
other (please specify) _____________________________________

13. Regarding the rapid and efficient capture of images:

Do you use a sheet-feed or other device? ___

Do you use a book-edge or other special scanner? ___

What is the approximate number of images that your
conversion facility can capture per hour or day? ______

How many staff and scanners provide this total throughput? ______

14. Have any of your documents been damaged during the capture process? Please give details, if possible.

15. Do you capture any items while they are sleeved in Mylar?

16. Have you been forced into workarounds by special problems, for example:

thin paper bleedthrough requiring special image processing,
image quality problems forcing transcription?

Technical Imaging Questions

17. Do you create more than one type of digital image, e.g., a preservation image and an access image? Why? How do they differ in terms of the below three sections (image characteristics, compression techniques, file formats)?

Image characteristics used

18. Please provide technical information on the image types you create, including:

spatial resolution as delivered (dots per inch or millimeter)

actual optical resolution (dots per inch or millimeter)

tonal-depth resolution (number of shades or colors or bits per pixel)

In this regard, are you aware of whether the scanning subsystem converts from the actual optical resolution to the delivered resolution? Do you know what technique is used for this process (for example pixel replication/deletion or linear interpolation)?

Compression techniques used

19. Please indicate the compression techniques used.

CCITT T.6/Group 4 ___
CCITT T.4/Group 3 ___
JPEG ___
JBIG ___
LZW ___
other (please specify) _____________________________________

File formats used

20. Please indicate the image file formats used.

TIFF vs. 6.00 ___
TIFF vs. 5.00 ___
ODA or ANSI/AIIM MS-53 ___
JPEG Interchange Format ___
JFIF ___
other (please specify) _____________________________________

File header or trailer information fields

21. For the named file formats, what header fields or tags are used? Please provide a list, giving the tag or element number. Can you provide a text dump of one of your files? Do they conform with any identifiable subsets of file formats (e.g. TIFF Class B or RFC 1314)?

22. Do you place identifying information in the header, e.g., a code number for the image, the name of your organization, or a title or subject term?

Scanners used

23. What primary scanner was used for capture (manufacturer and model)? Was it modified or customized in any fashion?

24. Was more than one scanner type used? Are any documents routed to a specialized scanner having different capture characteristics?

Special image processing used

25. What image processing or image enhancement approaches have you found helpful?

26. Do you apply de-skewing or border cropping techniques to your images?

27. Does your approach result in bitonal (one-bit-per-pixel) images?

28. Does your system employ special forms of threshholding, density control, or contrast and brightness management?

Database issues

29. How do you link images to the retrieval tool? Do you link directly by pathname/filename, use a look-up table, use an identifier in the header, or some other approach?

30. What file or directory naming conventions do you use? Are these techniques used to link images to other records?

31. What indexing means is used to link documents and image files to bibliographic records or other search tools?

32. Do you do more or less indexing work for materials being scanned as compared to traditional materials?

Access issues

33. What are the intended uses of your images? Are they for preservation only, for screen access over local area networks, for wide area network access, or for local printing?


34. Does your institution have an approach for the preservation of the digital data represented by the images? Please provide a brief statement.

35. Is there a policy on migration of data to newer media as time progresses?

36. Is there a policy on the monitoring of error correction rates or for random sampling of seldom used collections?

37. What media are used?

38. What is the average image size? If both preservation and access images are stored, please indicate the average size for each type.


39. To what extent are standards issues key to your approach to digital imaging? Do you believe de facto or de jure standards should be used?

Quality assurance

40. What level of quality assurance is used?

41. Is visual inspection used? On what percentage of scans?

42. Is automatic quality assurance used?

Document preparation

43. How do you prepare documents for scanning?

44. Do you separate different material types or keep them together in the workflow?

45. Are any special steps taken in the physical preparation for scanning, such as

disbinding ___
guillotining ___
fastener removal ___
other (please specify)? _____________________________________

46. How much time does each of these steps consume?


47. What breakthroughs in imaging technology would help you most?

good microfilm scanner ___
high-end book scanner ___
face-up book cradle ___
page turning device ___
high-speed input subsystem ___
automatic quality assurance ___
preservation file format ___
other (please specify) _____________________________________

