8. Compression

Advantage of compressing images. One of the most interesting discussions pertained to image compression. This topic, of course, takes on great importance once a decision to create tonal images has been reached. An 8.5x11-inch document captured at 300 dpi and 8 bits per pixel creates a file of over 8 million bytes before it is compressed. An uncompressed color image of that document comprises about 25 million bytes.5

Lossy compression and JPEG. The most widely used compression algorithms for tonal images are lossy. This means that in addition to removing redundancy from the image in a reversible way, simplifications are introduced into the image's representation that exploit the weaknesses of the human visual system. The goal is to introduce losses that are visually imperceptible under standard viewing conditions.

The JPEG (Joint Photographic Experts Group) standard is a joint international standard of the ISO and ITU-T (formerly CCITT) standards organizations. JPEG compression includes a wide variety of techniques, but of particular interest is the widely-implemented baseline DCT (discrete cosine transform) algorithm. This permits a wide range of trade-offs of image quality versus compressed file size.

JPEG quality setting. The amount of JPEG compression is variable and can be set by the user at a desired level. Most compression software (or software-hardware packages) ask users to set the "quality" at a certain numerical value; the amount of compression actually delivered will vary from image to image, depending upon the image's characteristics. In addition, the chrominance components of a color image (which are also subsampled at half the spatial resolution during the JPEG compression process) are compressed even more strongly than the grayscale component. Thus, at the same quality setting, the compressed 24-bit file will be reduced in size by a greater degree than a comparable 8-bit grayscale file.

Depending on their intended use, images compressed with the JPEG algorithm by factors of as much as 25 to 1 or 30 to 1 can still be very useful, although artifacts created by the process may be visible. These include blockiness in the image, especially visible in "flat" areas of even tonality, and "echoing" or "ringing"--a visible shadow that echoes the sharp edge between dark and light areas, e.g., on a typed or written mark. When compression ratios are lowered to the order of 10:1, the introduction of artifacts is minimal.

The consultants recommended using a quality setting that provides, on average, 10:1 to 20:1 compression for grayscale images and higher for color. Compression of 10:1 was produced by "quality level" 20 in the system used for the preliminary samples; other systems may require different numerical settings. This setting, the consultants said, would reduce both 8 megabyte grayscale images and 25 megabyte color images to 1 megabyte or less.

Lossy and lossless compression. Some committee members expressed reservations about the use of lossy compression algorithms. As had been the case when considering image enhancement, lossy compression suggested distortion or degradation of the image. If an archivist were considering an image for preservation, ought the archived form not perfectly represent the captured bitstream, i.e., be stored uncompressed or with lossless compression? Other members and the consultants referred back to the general principle that perfect facsimiles need not be a sine qua non for routine manuscript documents, especially when reckoned against the storage savings afforded by modest levels of compression. The committee's consideration of this trade-off recognized the need of archives and libraries to produce very large numbers of images of documents that have only moderate artifactual value. For the Federal Theatre Project's hundreds of thousands of typescript pages, it was not necessary to have museum-quality reproduction.

The discussion then turned to lossless compression. Some committee members pointed out that algorithms for binary images were lossless and asked whether this might not be a reason to reconsider the provisional decision to name tonal images as the preservation-quality choice. Others asked about lossless algorithms for multitonal images. e.g., lossless JPEG or LZW (Lempel Ziv Welch).

The consultants pointed out that scanners that produce binary images capture grayscale information and then apply thresholding that discards seven-eighths of the information. By definition, this is a very lossy process and the "lossless" compression algorithm is applied after this lossy thresholding has occurred. JPEG compression introduces loss at an earlier stage in the process and succeeds in preserving most of the tonal content of the original. The consultants argued that introduction of loss, whether through thresholding in a binary image or through the compression algorithm of a grayscale or color algorithm, was appropriate. Much of the wealth of data in an image is either redundant (hence derivable for a given pixel from its neighboring pixels), imperceptible to people (because the finest of details are not seen well by the human eye), or is simply noise. Many scanners produce data wherein noise dominates the bottom bit or two of the data--oddly enough, the eye often finds this pleasing--but it would be difficult to argue that this noise is an inherent part of the image that cannot be lost.

The consultants reported that lossless JPEG--which is not widely implemented--and LZW have varying performance but often produce around a 2:1 compression ratio. These algorithms could be used, but one might ask if the slight improvement in quality from lossy JPEG merits five to ten times the file size? For certain items, like pictorial works and top treasures, this might be warranted. For other items, like routine documents, it probably is not.

It is worth noting that LZW is patented and requires a license. The family of compression algorithms called "Zip" compression is another alternative with somewhat better performance and no licensing difficulties.

It is also worth noting that the JPEG committee is now actively evaluating several possible replacement algorithms for the current lossless approach, owing to universal disappointment with its performance.

At the end of the discussion, the committee authorized Picture Elements to compress the test-bed images with the JPEG algorithm, applying a quality level that would produce an average compression of 10:1 for the grayscale examples.

Next Section | Previous Section | Contents