Filters in Image Processing Image Compression David Svoboda email: svoboda@fi.muni.cz Centre for Biomedical Image Analysis Faculty of Informatics, Masaryk University, Brno, CZ October 25, 2019 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 1 / 45 Outline 1 Introduction 2 Lossless Compression Variable-Length Coding (Huffman, arithmetic, . . . ) LZW Coding Bit-plane Coding Lossless Predictive Coding 3 Lossy Compression Lossy Predictive Coding Transform Coding David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 2 / 45 Image Compression Introduction What is an image compression? reduction of the amount of data required to represent a digital image Application TV conferencing remote sensing (satellite imagery) medical imaging facsimile transmission (fax) . . . David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 4 / 45 Image Compression Definitions Some theory ⇒ Information – what we want to store or transmit ⇒ Data – the mean by which information is conveyed ⇒ Data redundancy – two distinct sources use a different type of data to give the same information There are three main data redundancy types coding redundancy interpixel redundancy psychovisual redundancy David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 5 / 45 Coding Redundancy Average number of bits Let h be a intensity histogram of an inspected image I. The probability of the occurrence of level i: p(i) = h(i) n , where n is total number of image pixels. If len(k) is number of bits needed to represent the value k then Lavg = 2bit depth−1 i=0 len(i)p(i) is the average number of bits required to represent each pixel in image I. Notice: The total number of bits needed to code M×N image is MNLavg . David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 6 / 45 Coding Redundancy An example We have 8-level image with 3-bit binary code i p(i) 3-bit code len(i) new code new len(i) l0 = 0 0.19 000 3 11 2 l1 = 1 0.25 001 3 01 2 l2 = 2 0.21 010 3 10 2 l3 = 3 0.16 011 3 001 3 l4 = 4 0.08 100 3 0001 4 l5 = 5 0.06 101 3 00001 5 l6 = 6 0.03 110 3 000001 6 l7 = 7 0.02 111 3 000000 6 Using new code brings better (lower) average number of bits per pixel: Lavg = 2(0.19) + 2(0.25) + 2(0.21) + 3(0.16) + 4(0.08) + 5(0.06) + 6(0.03) + 6(0.02) = 2.7 bits David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 7 / 45 Coding Redundancy The meaning: coding redundancy . . . the phenomenon when we use more code symbols (bits) than it is necessary The aim: the functions “bit length” and “appearance probability” are inversely proportional the assignment of fewer bits to the more probable gray level and vice versa leads to image compression David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 8 / 45 Interpixel Redundancy correlation of pixels within an image the value of certain pixel in the image can be reasonably predicted from the values of group of other pixels in the image the gray levels of neighboring pixels are roughly the same, for example An example: chessboard 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 9 / 45 Interpixel Redundancy Sample solution Repetitious pixels may be grouped together: 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 → (5,1)(5,0)(5,1) 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 → (5,1)(5,0)(5,1) 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 → (5,1)(5,0)(5,1) 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 → (5,1)(5,0)(5,1) 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 → (5,1)(5,0)(5,1) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 → (5,0)(5,1)(5,0) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 → (5,0)(5,1)(5,0) David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 10 / 45 Psychovisual Redundancy ⇒ The eye does not respond with equal sensitivity to the whole visual information. ⇒ Small intensity variations can be perceived in an area of constant intensity. ⇒ Certain information has less relative importance than other information → is psychovisually redundant. Some examples: We are more sensitive to differences between dark intensities than bright ones. We are more sensitive to differences of intensity in green than red or blue. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 11 / 45 Psychovisual Redundancy An example 24-bit RGB color image 4-bit color image Notice: Elimination of psychovisual redundant data results in a loss of (nonimportant) information → quantization → lossy compression methods. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 12 / 45 General Compression Model f(x,y) source encoder data transmission data decoder f’(x,y) f (x, y) . . . original input image f (x, y) . . . input image after compression followed by decompression f (x, y) = f (x, y) in general encoder . . . image filter responsible for image compression decoder . . . image filter responsible for image decompression David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 13 / 45 Lossless Compression Request for error-free compression: archival documents medical imaging business documents digital radiography Common error-free compression methods: variable-length coding (Huffman, arithmetic) LZW coding bit-plane coding Lossless predictive coding run length encoding (RLE) David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 15 / 45 Huffman coding An example symbol A1 A2 A3 A4 A5 A6 probability 0.1 0.4 0.07 0.1 0.04 0.29 A3 A5A4A1 A6 A2 001 0000 0001 0010 0011 000 01 0 1 00 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 16 / 45 Huffman coding Variable-length coding ≈ reduction of coding redundancy Algorithm 1 order the source symbol probabilities of appearance 2 reduce/merge the two lowest probable symbols into a new single symbol 3 repeat the reductions until it is possible 4 get the coding tree 5 mark the tree nodes with binary code (O – left, 1 – right) 6 use the path code to code the individual symbol in the leaves David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 17 / 45 Huffman coding Huffman coding creates the optimal code for a set of source symbols but it is difficult to construct Huffman code tree for larger sets. Alternative solutions: Truncated Huffman – using Huffman coding only for the most probable source symbols. A prefix code followed by a suitable fixed-length code is used to represent all lower probable symbols. B2-Code Binary shift Huffman shift Notice: Alternative solution reduces the computational complexity with sacrificing coding efficiency. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 18 / 45 Arithmetic Coding An example of encoding process Table of available symbols: symbol N I O S X W probability 0.35 0.2 0.2 0.15 0.05 0.05 Input sequence: ’ONION’ 0.55 0.90 0.95 0.75 0.35 0.55 0.75 0.55 0.62 0.5745 0.5885 0.5822 0.585 0.5822 0.583180 0.0 1.0 S X W N O S X W N I O S X W N I O S X W N I O S X W N I O 0.583 I Output code word: 0.583 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 19 / 45 Arithmetic Coding An example of decoding process Table of available symbols: symbol N I O S X W probability 0.35 0.2 0.2 0.15 0.05 0.05 Input code word: 0.583 0.55 0.90 0.95 0.75 0.35 0.55 0.75 0.55 0.62 0.5745 0.5885 0.5822 0.585 0.0 1.0 S X W N O S X W N I O S X W N I O S X W N I O S X W N I O I 0.583 0.583 0.583 0.583 0.583 Output sequence: ’ONION’ David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 20 / 45 Arithmetic Coding In arithmetic coding the whole sequence of source symbols is assigned a single arithmetic code word. The sequence is usually very short. The data block is therefore split into several sequences. Code word itself defines an interval of real number between 0 and 1. The number of symbols increases → the code interval is smaller → number of bits required becomes larger. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 21 / 45 LZW Coding An example Given this sequence for encoding TOBEORNOTTOBEORTOBEORNOT# and simple dictionary containing only single characters: Dictionary (5-bit) Entries: --------------------------- 0: # = 00000 1: A = 00001 2: B = 00010 3: C = 00011 ... 27: Z = 11011 The length of sequence = 25 symbols × 5b = 125 b David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 22 / 45 LZW Coding An example of encoding process Input: Bit Code (Output): New Dictionary Entry: ---------------------------------------------------- T 20 = 10100 O 15 = 01111 28: TO B 2 = 00010 29: OB E 5 = 00101 30: BE O 15 = 01111 31: EO (full dictionary!) R 18 = 010010 32: OR N 14 = 001110 33: RN O 15 = 001111 34: NO T 20 = 010100 35: OT TO 28 = 011100 36: TT BE 30 = 011110 37: TOB OR 32 = 100000 38: BEO TOB 37 = 100101 39: ORT EO 31 = 011111 40: TOBE RN 33 = 100001 41: EOR OT 35 = 100011 42: RNO # 0 = 000000 43: OT# Coded length = 5 × 5b + 12 × 6b = 97b. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 23 / 45 LZW Coding An example of decoding process Before we start decoding, the only original dictionary (A,B,...,Z) is available! Input (BitCode): Output: New Dictionary Entry: ---------------------------------------------------- 20 ( 10100) T 15 ( 01111) O 28: TO 2 ( 00010) B 29: OB 5 ( 00101) E 30: BE 15 ( 01111) O 31: EO (full dictionary!) 18 (010010) R 32: OR 14 (001110) N 33: RN 15 (001111) O 34: NO 20 (010100) T 35: OT 28 (011100) TO 36: TT 30 (011110) BE 37: TOB 32 (100000) OR 38: BEO 37 (100101) TOB 39: ORT 31 (011111) EO 40: TOBE 33 (100001) RN 41: EOR 35 (100011) OT 42: RNO 0 (000000) # 43: OT# Output: TOBEORNOTTOBEORTOBEORNOT# David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 24 / 45 LZW Coding Covers the problem of coding as well as interpixel redundancy. LZW = Lempel-Ziv-Welch: assigns fixed length code words to variable length sequences is dictionary (codebook)-based coding its dictionary is built during the coding process does not require any apriori knowledge of the probability of occurrence of the source symbols integrated in GIF, TIFF, PDF when the dictionary is full, we flush it and start with empty one David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 25 / 45 Bit-plane coding Bit-plane decomposition . . . gray level of an m-bit pixel in gray scale image can be represented in the form of the base 2 polynomial: am−12m−1 + am−22m−2 + · · · + a121 + a020 Therefore, the image can be simply decomposed into m 1-bit bit planes: zeroth-order bit plane . . . a0 bits of each pixel first-order bit plane . . . a1 bit of each pixel ... (m − 1)st-order bit plane . . . am−1 bit each pixel Question: Is it a good representation? Imagine neighbouring graylevels 127 and 128. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 26 / 45 Bit-plane coding An alternative decomposition approach: m-bit Gray code gi = ai ⊕ ai+1 0 ≤ i ≤ m − 2 gm−1 = am−1 where ⊕ denotes the exclusive OR operation: small changes in gray level are less likely to affect all m bit planes neighbours 127 and 128 in Gray code are 11000000 and 01000000 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 27 / 45 Bit-plane coding Bit-plane properties: high-order bit-plane . . . large uniform areas low-order bit-plane . . . quite complex Bit-plane compression: Constant Area Coding (CAC) . . . binary image is split into tiles White Block Skipping (WBS) . . . only black area is stored Run-Length Coding (RLE) . . . codes the length and the value of each uniform run David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 28 / 45 Lossless Predictive Coding Intended to eliminate the interpixel redundancy of closely spaced pixels. An idea: encoded value of the pixel is the difference between the actual and predicted value of that pixel. predictor symbol encoder compressed outputplain input p(f) + − Σ f e nearest integer e(n) = f (n) − p(f (n)) David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 29 / 45 Lossless Predictive Coding Some types of predictors: p(f (n)) = round M i=1 α(i)f (n − i) p(f (m, n)) = round M i=1 α(i)f (m, n − i) p(f (m, n)) = round [αf (m, n − 1)] Notice: Predictor is typically linear combination of few previous pixels. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 30 / 45 Lossless Predictive Coding Decompression predictor + + Σsymbol decoder decompressed output compressed input f p(f) e David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 31 / 45 Lossy compression By now f (x, y) = f (x, y) was valid. Now: f (x, y) = f (x, y) Why lossy compression? compression ratio is higher . . . 10:1 to 50:1 What is the main difference? quantizer (see psychovisual redundancy) is present David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 33 / 45 (Lossy Predictive Coding) Encoder & Decoder + − Σ plain input f + + Σ q(e)e q(f)p(f) q(e)input compressed +Σ+ f’ p(f’) symbol decoder quantizer predictor predictor symbol encoder decompressed compressed output output David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 34 / 45 (Lossy Predictive Coding) An example Delta modulation (DM) p(f (n)) = αf (n − 1) q(e(n)) = +ξ for e(n) > 0 −ξ otherwise where: α . . . prediction coefficient (≤ 1) ξ . . . positive constant David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 35 / 45 (Lossy Predictive Coding) An example Delta modulation (DM) α = 1; ξ = 6.5 n f p(f) e q(e) q(f) 0 14 – – – 14.0 1 15 14.0 1.0 6.5 20.5 2 14 20.5 -6.5 -6.5 14.0 3 15 14.0 1.0 6.5 20.5 ... ... ... ... ... ... 14 29 20.5 8.5 6.5 27.0 15 37 27.0 10.0 6.5 33.5 ... ... ... ... ... ... David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 36 / 45 Transform Coding All the previous compression techniques operate directly on the pixels of an image → are spatial domain methods. Transform coding . . . compression is realized in another domain (e.g. Fourier domain) n x n subimages construct n x n subimages merge quantizer symbol encodertransform forward symbol decoder transform inverse input image decompressed imagecompressed image image compressed David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 37 / 45 Transform Coding Three main issues: subimage size dimensions are typically power of 2, i.e. 8 × 8 or 16 × 16 transform selection (Fourier, DCT, Walsh-Hadamard, KLT, Wavelet, . . . ) bit allocation select which part of transformed domain is less important (redundant) and hence can be eliminated David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 38 / 45 Transform Coding Subimage Size The individual transforms are supposed to be applied to subimages. The image size affects: transform coding error computational complexity David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 39 / 45 Transform Coding Transform Selection The selected transform should be able to decorrelate the input signal (≈ energy compaction) easy to implement and fast (computational complexity) orthogonal (reversible) The most common transforms Optimal decorrelation PCA transform Approximations Discrete Fourier transform DCT Walsh-Hadamard David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 40 / 45 Transform Coding Bit Allocation ≈ Quantization An idea remove/suppress less important (redundant) part of transform domain The process of less important pixels removal is called bit allocation: zonal coding . . . based on variance coefficients of maximum variance carry the most image information and should be retained threshold coding . . . based on magnitude coefficients which are high enough are retained – very simple for evaluation Notice: In Fourier domain, the less important pixels correspond to high frequency coefficients. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 41 / 45 Transform Coding Bit Allocation Example – Zonal coding The transformed coefficients are masked (multiplied by 0/1) with zonal mask. Coefficients of maximum variance usually are located around the low frequencies. An example of subimage 8 × 8 and corresponding zonal mask: k l 1 1 1 1 1 1 1 1 1 1 1 1 11 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000 00 0 000 0 00000 0 0 0 0 0 00000 0 0 0 0 0 Transformed image David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 42 / 45 Transform Coding Bit Allocation Example – Threshold coding The transformed coefficient are quantized using point-wise division with normalization array Z. The least important coefficients are suppressed: T(k, l) = round T(k, l) Z(k, l) An example of normalization array: 16 11 10 16 24 12 12 14 19 14 13 16 1714 18 22 24 35 6449 72 92 78 55 37 22 29 24 40 57 69 56 55 615140 5826 60 808751 62 776856 64 81 92 87 98 99 109 103 95 103 112 121120 101 103100 113104 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 43 / 45 Bibliography Salomon D. Data Compression, The Complete Reference, 4th edition, Springer, London, 2007, ISBN-1846286025 Gonzalez, R. C., Woods, R. E. Digital image processing / 2nd ed., Upper Saddle River: Prentice Hall, c2002, pages 793, ISBN 0201180758 Kuhn M. Information Theory and Coding Image, video and audio compression – slides from lectures Welch, T. A. A technique for high-performance data compression, Computer. Vol. 17, pp. 8-19. June 1984 David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 44 / 45 You should know the answers . . . Explain the difference among coding redundancy, interpixel redundancy, and psychovisual redundancy. What does quantizer do? Which type of data cannot be compressed by using lossy compression methods? Show the construction of Huffman coding tree. How does the arithmetic decoder know when it should stop decoding one codeword? How does the encoder deliver the dictionary to the receiver’s decoder in LZW compression scheme? Explain the meaning of predictor and error in lossless predictive coding scheme. Why do we split the 2D images into tiles of size 8×8 or 16×16 pixels? Which compression scheme would you use for coding of chessboard image of size 64×64 pixels. Assume that each tile is defined as a homogeneous square area. David Svoboda (CBIA@FI) Filters in Image Processing autumn 2019 45 / 45