JPEG
(Sections of this taken from JPEG FAQ and written by Tom Lane .)JPEG (pronounced "jay-peg") is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard. JPEG is designed for compressing either full-color or gray-scale digital images of "natural", real-world scenes. It does not work very well on non-realistic images, such as cartoons or line drawings. JPEG does not handle black-and-white (1-bit-per-pixel) images, nor does it handle motion picture compression. Related standards for compressing those types of images exist, and are called JBIG and MPEG respectively. Regular JPEG is "lossy", meaning that the image you get out of decompression isn't quite identical to what you originally put in. The algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small color details aren't perceived as well as small details of light-and-dark. Thus, JPEG is intended for compressing images that will be looked at by humans. If you plan to machine-analyze your images, the small errors introduced by JPEG may be a problem for you, even if they are invisible to the eye. The JPEG standard includes a separate lossless mode, but it is rarely used and does not give nearly as much compression as the lossy mode. There are a lot of parameters to the JPEG compression process. By adjusting the parameters, you can trade off compressed image size against reconstructed image quality over a *very* wide range. You can get image quality ranging from op-art (at 100x smaller than the original 24-bit image) to quite indistinguishable from the source (at about 3x smaller). Usually the threshold of visible difference from the source image is somewhere around 10x to 20x smaller than the original, ie, 1 to 2 bits per pixel for color images. Grayscale images do not compress as much. In fact, for comparable visual quality, a grayscale image needs perhaps 25% less space than a color image; certainly not the 66% less that you might naively expect. JPEG defines a "baseline" lossy algorithm, plus optional extensions for progressive and hierarchical coding. There is also a separate lossless compression mode; this typically gives about 2:1 compression, ie, about 12 bits per color pixel. Most currently available JPEG hardware and software handles only the baseline mode. Here's the outline of the baseline compression algorithm: 1. Transform the image into a suitable color space. This is a no-op for grayscale, but for color images you generally want to transform RGB into a luminance/chrominance color space (YCbCr, YUV, etc). The luminance component is grayscale and the other two axes are color information. The reason for doing this is that you can afford to lose a lot more information in the chrominance components than you can in the luminance component: the human eye is not as sensitive to high-frequency chroma info as it is to high-frequency luminance. (See any TV system for precedents.) You don't have to change the color space if you don't want to, since the remainder of the algorithm works on each color component independently, and doesn't care just what the data is. However, compression will be less since you will have to code all the components at luminance quality. Note that colorspace transformation is slightly lossy due to roundoff error, but the amount of error is much smaller than what we typically introduce later on. 2. (Optional) Downsample each component by averaging together groups of pixels. The luminance component is left at full resolution, while the chroma components are often reduced 2:1 horizontally and either 2:1 or 1:1 (no change) vertically. In JPEG-speak these alternatives are usually called 2h2v and 2h1v sampling, but you may also see the terms "411" and "422" sampling. This step immediately reduces the data volume by one-half or one-third. In numerical terms it is highly lossy, but for most images it has almost no impact on perceived quality, because of the eye's poorer resolution for chroma info. Note that downsampling is not applicable to grayscale data; this is one reason color images are more compressible than grayscale. 3. Group the pixel values for each component into 8x8 blocks. Transform each 8x8 block through a discrete cosine transform (DCT). The DCT is a relative of the Fourier transform and likewise gives a frequency map, with 8x8 components. Thus you now have numbers representing the average value in each block and successively higher-frequency changes within the block. The motivation for doing this is that you can now throw away high-frequency information without affecting low-frequency information. (The DCT transform itself is reversible except for roundoff error.) See question 25 for fast DCT algorithms. 4. In each block, divide each of the 64 frequency components by a separate "quantization coefficient", and round the results to integers. This is the fundamental information-losing step. The larger the quantization coefficients, the more data is discarded. Note that even the minimum possible quantization coefficient, 1, loses some info, because the exact DCT outputs are typically not integers. Higher frequencies are always quantized less accurately (given larger coefficients) than lower, since they are less visible to the eye. Also, the luminance data is typically quantized more accurately than the chroma data, by using separate 64-element quantization tables. Tuning the quantization tables for best results is something of a black art, and is an active research area. Most existing encoders use simple linear scaling of the example tables given in the JPEG standard, using a single user-specified "quality" setting to determine the scaling multiplier. This works fairly well for midrange qualities (not too far from the sample tables themselves) but is quite nonoptimal at very high or low quality settings. 5. Encode the reduced coefficients using either Huffman or arithmetic coding. (Strictly speaking, baseline JPEG only allows Huffman coding; arithmetic coding is an optional extension.) Notice that this step is lossless, so it doesn't affect image quality. The arithmetic coding option uses Q-coding; it is identical to the coder used in JBIG (see question 74). Be aware that Q-coding is patented. Most existing implementations support only the Huffman mode, so as to avoid license fees. The arithmetic mode offers maybe 5 or 10% better compression, which isn't enough to justify paying fees. 6. Tack on appropriate headers, etc, and output the result. In a normal "interchange" JPEG file, all of the compression parameters are included in the headers so that the decompressor can reverse the process. These parameters include the quantization tables and the Huffman coding tables. For specialized applications, the spec permits those tables to be omitted from the file; this saves several hundred bytes of overhead, but it means that the decompressor must know a-priori what tables the compressor used. Omitting the tables is safe only in closed systems. The decompression algorithm reverses this process. The decompressor multiplies the reduced coefficients by the quantization table entries to produce approximate DCT coefficients. Since these are only approximate, the reconstructed pixel values are also approximate, but if the design has done what it's supposed to do, the errors won't be highly visible. A high-quality decompressor will typically add some smoothing steps to reduce pixel-to-pixel discontinuities. The JPEG standard does not specify the exact behavior of compressors and decompressors, so there's some room for creative implementation. In particular, implementations can trade off speed against image quality by choosing more accurate or faster-but-less-accurate approximations to the DCT. Similar tradeoffs exist for the downsampling/upsampling and colorspace conversion steps. (The spec does include some minimum accuracy requirements for the DCT step, but these are widely ignored, and are not too meaningful anyway in the absence of accuracy requirements for the other lossy steps.) Extensions: The progressive mode is intended to support real-time transmission of images. It allows the DCT coefficients to be sent piecemeal in multiple "scans" of the image. With each scan, the decoder can produce a higher-quality rendition of the image. Thus a low-quality preview can be sent very quickly, then refined as time allows. The total space needed is roughly the same as for a baseline JPEG image of the same final quality. (In fact, it can be somewhat *less* if a custom Huffman table is used for each scan, because the Huffman codes can be optimized over a smaller, more uniform population of data than appears in a baseline image's single scan.) The decoder must do essentially a full JPEG decode cycle for each scan: inverse DCT, upsample, and color conversion must all be done again, not to mention any color quantization for 8-bit displays. So this scheme is useful only with fast decoders or slow transmission lines. Up until 1995, progressive JPEG was a rare bird, but its use is now spreading as software decoders have become fast enough to make it useful with modem-speed data transmission. The hierarchical mode represents an image at multiple resolutions. For example, one could provide 512x512, 1024x1024, and 2048x2048 versions of the image. The higher-resolution images are coded as differences from the next smaller image, and thus require many fewer bits than they would if stored independently. (However, the total number of bits will be greater than that needed to store just the highest-resolution frame in baseline form.) The individual frames in a hierarchical sequence can be coded progressively if desired. Hierarchical mode is not widely supported at present. Part 3 of the JPEG standard, approved at the end of 1995, introduces several new extensions. The one most likely to become popular is variable quantization, which allows the quantization table to be scaled to different levels in different parts of the image. In this way the "more critical" parts of the image can be coded at higher quality than the "less critical" parts. A signaling code can be inserted at any DCT block boundary to set a new scaling factor. Another Part 3 extension is selective refinement. This feature permits a scan in a progressive sequence, or a refinement frame of a hierarchical sequence, to cover only part of the total image area. This is an alternative way of solving the variable-quality problem. My (tgl's) guess is that this will not get widely implemented, with variable quantization proving a more popular approach, but I've been wrong before. The third major extension added by Part 3 is a "tiling" concept that allows an image to be built up as a composite of JPEG frames, which may have different sizes, resolutions, quality settings, even colorspaces. (For example, a color image that occupies a small part of a mostly-grayscale page could be represented as a separate frame, without having to store the whole page in color.) Again, there's some overlap in functionality with variable quantization and selective refinement. The general case of arbitrary tiles is rather complex and is unlikely to be widely implemented. In the simplest case all the tiles are the same size and use similar quality settings. This case may become popular even if the general tiling mechanism doesn't, because it surmounts the 64K-pixel-on-a-side image size limitation that was (not very foresightedly) built into the basic JPEG standard. The individual frames are still restricted to 64K for compatibility reasons, but the total size of a tiled JPEG image can be up to 2^32 pixels on a side. Lossless JPEG: The separate lossless mode does not use DCT, since roundoff errors prevent a DCT calculation from being lossless. For the same reason, one would not normally use colorspace conversion or downsampling, although these are permitted by the standard. The lossless mode simply codes the difference between each pixel and the "predicted" value for the pixel. The predicted value is a simple function of the already-transmitted pixels just above and to the left of the current one (for example, their average; 8 different predictor functions are permitted). The sequence of differences is encoded using the same back end (Huffman or arithmetic) used in the lossy mode. Lossless JPEG with the Huffman back end is certainly not a state-of-the-art lossless compression method, and wasn't even when it was introduced. The arithmetic-coding back end may make it competitive, but you're probably best off looking at other methods if you need only lossless compression. The main reason for providing a lossless option is that it makes a good adjunct to the hierarchical mode: the final scan in a hierarchical sequence can be a lossless coding of the remaining differences, to achieve overall losslessness. This isn't quite as useful as it may at first appear, because exact losslessness is not guaranteed unless the encoder and decoder have identical IDCT implementations (ie, identical roundoff errors). And you can't use downsampling or colorspace conversion either if you want true losslessness. But in some applications the combination is useful. References: For a good technical introduction to JPEG, see: Wallace, Gregory K. "The JPEG Still Picture Compression Standard", Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44. (Adjacent articles in that issue discuss MPEG motion picture compression, applications of JPEG, and related topics.) If you don't have the CACM issue handy, a PostScript file containing a revised version of this article is available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz. This file (actually a preprint for a later article in IEEE Trans. Consum. Elect.) omits the sample images that appeared in CACM, but it includes corrections and some added material. Note: the Wallace article is copyright ACM and IEEE, and it may not be used for commercial purposes. An alternative, more leisurely explanation of JPEG can be found in "The Data Compression Book" by Mark Nelson ([Nel 1991], see question 7). This book provides excellent introductions to many data compression methods including JPEG, plus sample source code in C. The JPEG-related source code is far from industrial-strength, but it's a pretty good learning tool. An excellent textbook about JPEG is "JPEG Still Image Data Compression Standard" by William B. Pennebaker and Joan L. Mitchell. Published by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. 650 pages, price US$59.95. (VNR will accept credit card orders at 800/842-3636, or get your local bookstore to order it.) This book includes the complete text of the ISO JPEG standards, DIS 10918-1 and draft DIS 10918-2. Review by Tom Lane: "This is by far the most complete exposition of JPEG in existence. It's written by two people who know what they are talking about: both served on the ISO JPEG standards committee. If you want to know how JPEG works or why it works that way, this is the book to have." The official specification of JPEG is not currently available on-line, and is not likely ever to be available for free because of ISO and ITU copyright restrictions. You can order it from your national standards agency as ISO standards IS 10918-1, 10918-2, 10918-3, or as ITU-T standards T.81, T.83, T.84. See ftp://ftp.uu.net/graphics/jpeg/jpeg.documents.gz for more info. NOTE: buying the Pennebaker and Mitchell textbook is a much better deal than purchasing the standard directly: it's cheaper and includes a lot of useful explanatory material along with the full draft text of the spec. The book unfortunately doesn't include Part 3 of the spec, but if you need Part 3, buy the book and just that part and you'll still be ahead.