Filters in Image Processing Analysing Images through Visual Descriptors David Svoboda and Tom´aˇs Majtner email: svoboda@fi.muni.cz Centre for Biomedical Image Analysis Faculty of Informatics, Masaryk University, Brno, CZ May 14, 2018 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 1 / 58 Outline 1 Motivation 2 Basic idea for image descriptors 3 Image classification 4 Most common image descriptors Haralick features Local binary patterns (LBP) MPEG-7 descriptors Scale-invariant feature transform (SIFT) Zernike features Moment invariants D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 2 / 58 Motivation Unknown image No meta information How to determine, what is in the image? D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 4 / 58 Motivation Results of a Google search for keyword ‘obama’ (from Nov. 2011) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 5 / 58 Motivation Results of searching for visually similar images of the official photo of president Obama (from Nov. 2011) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 6 / 58 Basic idea for image descriptors What are image descriptors? a smaller (a shorter) form of an image, which encodes some important image characteristics this image form is used in image recognition tasks including comparing images finding similar images distinguish images Desired properties fast computation (real-time tasks) invariance to scale, rotation, and distortion changes D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 8 / 58 Basic idea for image descriptors D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 9 / 58 Image classification Image classification includes a broad range of approaches to the identification of images. analyses the numerical properties of various image features and organizes data into categories – image classes (clusters). compares the feature vectors using a chosen metric ⇒ close objects in feature space are considered visually similar and form clusters. Image classes may be specified a priori by an analyst – supervised classification clustered automatically – unsupervised classification Classification algorithms typically employ two phases training phase – a unique description of each classification category (training class) is created testing phase – feature-space partitions are used to classify image features D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 11 / 58 Image classification Most common classification methods Cluster Analysis – unsupervised method k-means clustering Decision Trees – non-parametric supervised method Neural Networks – statistical learning algorithms for supervised classification Support Vector Machine (SVM) – supervised classification, very popular k-Nearest Neighbours algorithm (k-NN) – simple, non-parametric, supervised method Convolutional Neural Networks (CNN) – learning based method D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 12 / 58 Image classification Simple example: feature vector has 2 components 1 Roundness – x-axis 2 # of red pixels – y-axis Roundness #ofredpixels What would be the feature vector of this query image? D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 13 / 58 Haralick features introduced in 1973 by Professor Haralick (see photo) from City University of New York popular approach for texture analysis Haralick features are still used in research based on so called co-occurrence matrix D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 15 / 58 Haralick features Co-occurrence matrix Co-occurrence matrix is the distribution of co-occurring values at a given offset mathematically, the co-occurrence matrix C is defined as C∆x,∆y (i,j)= n p=1 m q=1    1, if I(p, q) = i ∧ I(p + ∆x, q + ∆y) = j or I(p, q) = i ∧ I(p − ∆x, q − ∆y) = j 0, otherwise i and j are the image intensity values of the image p and q are the spatial positions in the n × m image I the offset (∆x,∆y) depends on the used direction θ and the distance d at which the matrix is computed D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 16 / 58 Haralick features Co-occurrence matrix (∆x,∆y) represents the separation vector 4 orientations are usually considered horizontal – separation vector (1, 0) for distance 1 vertical – separation vector (0, 1) for distance 1 main diagonal – separation vector (1, 1) for distance 1 minor diagonal – separation vector (1, −1) for distance 1 0 3 3 0 0 1 2 2 1 #(0, 0) #(0, 1) #(0, 2) #(0, 3) #(1, 0) #(1, 1) #(1, 2) #(1, 3) #(2, 0) #(2, 1) #(2, 2) #(2, 3) #(3, 0) #(3, 1) #(3, 2) #(3, 3) Original image I General form of co-occurrence matrix for image I D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 17 / 58 Haralick features Co-occurrence matrix 2 1 0 1 1 0 1 0 0 1 2 0 1 0 0 2 C1, 0 = 2 0 2 1 0 2 0 1 2 0 0 0 1 1 0 0 C0, 1 = 2 1 1 0 1 0 0 1 1 0 0 0 0 1 0 0 C1, 1 = 0 0 1 2 0 0 1 0 1 1 0 0 2 0 0 0 C1, -1 = 0 3 3 0 0 1 2 2 1 Original image I D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 18 / 58 Haralick features Co-occurrence matrix because simple 8-bit images could have 256 intensity values, corresponding co-occurrence matrices will be very large solution is to use quantization prior to the extraction process co-occurrence matrices are in the end normalized and averaged to form the final co-occurrence matrix C Note: All co-occurrence matrices are symmetric (why?) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 19 / 58 Haralick features Haralick suggested 14 features that could be derived from the matrix and form the feature vector of Haralick features entropy: − q i=1 q j=1 C(i, j) log C(i, j) texture correlation: q i=1 q j=1 |i − j|C(i, j) texture homogeneity: q i=1 q j=1 C(i, j) 1 + |i − j| and the others ... (q is the maximal intensity present in the image) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 20 / 58 Haralick features Bibliography R. M. Haralick, K. Shanmugam, and I. Dinstein. Textural Features for Image Classification. IEEE Trans. on Systems, Man and Cyber., SMC-3(6):610–621, 1973. L. Tesaˇr, D. Smutek, A. Shimizu, and H. Kobatake. 3D Extension of Haralick Texture Features for Medical Image Analysis. In Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications, SPPRA ’07, pages 350–355. ACTA Press, 2007. D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 21 / 58 Local binary patterns (LBP) introduced in 1994 by Ojala (upper photo) and Pietik¨ainen (lower photo) from University of Oulu, Finland descriptor became famous after generalization in 2002 originally proposed for face recognition currently used also in (bio)medical image analysis, motion analysis, eye localization, fingerprint recognition, and many others D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 22 / 58 Local binary patterns (LBP) Original approach (1994) Idea: Texture can be described by the pattern and its strength LBP pattern 1 each pixel is compared with its 8 neighbours 2 if the intensity value of neighbouring pixel is greater than or equal to the value of examined pixel’s intensity, write 1 (otherwise, write 0) 7 1 9 2 5 5 6 0 3 1 0 1 0 1 1 0 0 Example Threshold D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 23 / 58 Local binary patterns (LBP) Original approach (1994) Idea: Texture can be described by the pattern and its strength LBP pattern 3 take the digits from top-left corner in clockwise order and interpret them as decimal number 4 this decimal number represents the pattern 7 1 9 2 5 5 6 0 3 1 0 1 0 1 1 0 0 Example Threshold Binary: 10110010 LBP = 178 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 24 / 58 Local binary patterns (LBP) Original approach (1994) Idea: Texture can be described by the pattern and its strength Strength of the pattern 5 decimals from entire image are used to form histogram (256 bins – why?) 6 concatenation of the normalized histogram values gives us the feature vector 7 1 9 2 5 5 6 0 3 1 0 1 0 1 1 0 0 Example Threshold Binary: 10110010 LBP = 178 ----- Feature vector D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 25 / 58 Local binary patterns (LBP) Generalization of LBP (2002) Idea: No limitation to the size of the neighbourhood and the number of sampling points parameter P - number of sampling points parameter R - size of the neighbourhood P = 8, R = 1 P = 8, R = 2 P = 4, R = 2 when the sampling point is not in the centre of the pixel, bilinear interpolation is used D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 26 / 58 Local binary patterns (LBP) LBP descriptor has many variants and modifications Median binary patterns – thresholding against the median within the neighbourhood Local ternary patterns – solving problem of nearly constant areas and the others . . . D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 27 / 58 Local binary patterns (LBP) Bibliography T. Ojala, M. Pietik¨ainen, and D. Harwood. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In 12th IAPR Intern. Conf. on Patt. Recog. Vol. 1 Conf. A: Computer Vision and Image Processing, pages 582–585, Oct. 1994. T. Ojala, M. Pietik¨ainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(7):971–987, July 2002. M. Pietik¨ainen, A. Hadid, G. Zhao, and T. Ahonen. Computer Vision Using Local Binary Patterns. Computational imaging and vision. Springer Verlag, London, 2011. D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 28 / 58 MPEG in general Motion Picture Experts Group (MPEG) – developed digital audiovisual compression standards (in 1988) MPEG-1 (1993) – the first standard for audio and video MP3 MPEG-2 (1995) – generic coding of moving pictures and associated audio information MPEG-4 (1998) – coding of audio-visual objects MPEG-7 (2002) – multimedia content description interface (including Visual descriptors) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 29 / 58 MPEG-7 descriptors part of MPEG-7 visual standard standardized low-level descriptors for different domains many contributors, joining editor B. S. Manjunath (see photo) first public release in 2002 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 30 / 58 MPEG-7 descriptors Division MPEG-7 visual descriptor are divided to 4 groups Colour descriptors – robust to viewing angle, translation, and rotation of the regions of interest (ROI), 6 features are included here Texture descriptors – contain important structural information of intensity variations and their relationship to the surrounding environment, 3 features are included here Shape descriptors – techniques for describing and matching shape features of 2D and 3D, 3 features are included here Motion descriptors – description of motion features in video sequences, 4 features are included here D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 31 / 58 MPEG-7 descriptors Texture descriptors MPEG-7 texture descriptors consist of three feature extractors Homogeneous Texture Descriptor (HTD) – characterizes the region texture using the mean energy and the energy deviation from the set of frequency channels Texture Browsing Descriptor (TBD) – specifies the perceptual characterization of the texture, which is similar to human perception Edge Histogram Descriptor (EHD) – spatial distribution of edges in the image Notice: We will briefly describe HTD and EHD. D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 32 / 58 MPEG-7 descriptors Homogeneous Texture Descriptor (HTD) 2D frequency plane is partitioned into 30 channels partitioning uniform along the angular direction and not uniform along the radial direction (in octave scale) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 33 / 58 MPEG-7 descriptors Homogeneous Texture Descriptor (HTD) – Gabor filters The individual channels are convolved using Gabor filters introduced in 1946 by Dennis Gabor (see photo) for 1D signal the filter is obtained by modulating a sinusoid with a Gaussian function it responds to some frequency in a localized part of the signal D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 34 / 58 MPEG-7 descriptors Homogeneous Texture Descriptor (HTD) – Gabor filters Extension of Gabor filters to 2D D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 35 / 58 MPEG-7 descriptors Homogeneous Texture Descriptor (HTD) – Gabor filters The (s, r)-th channel, where s is radial index and r is angular index, is modelled in frequency domain as Gs,r (ω, θ) = exp −(ω−ωs )2 2σ2 s .exp −(θ−θr )2 2τ2 r σs and τr are standard deviation of the Gaussian in radial and angular direction, respectively θr = 30◦ × r, where r ∈ {0, 1, 2, 3, 4, 5} ωs = ω0 × 2−s, where s ∈ {0, 1, 2, 3, 4} and ω0 is the highest frequency D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 36 / 58 MPEG-7 descriptors Homogeneous Texture Descriptor (HTD) The syntax of the HTD is as follows: HTD = [fDC , fSD, e1, e2, ..., e30, d1, d2, ..., d30] fDC is the mean of the image fSD is the standard deviation of the image ei and di are non-linearly scaled and quantized mean and standard deviation of the ith channel (i ∈ {1, 2, .., 30}) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 37 / 58 MPEG-7 descriptors Edge Histogram Descriptor (EHD) EHD represents the local edge distribution in the image divide image space in 4 × 4 sub-images each sub-image divided into non-overlapping squared image blocks (1100 image blocks) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 38 / 58 MPEG-7 descriptors Edge Histogram Descriptor (EHD) EHD represents the local edge distribution in the image each image block is classified into one of the 5 edge categories or as non-edge block classification is done by applying corresponding edge detector and thresholding D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 39 / 58 MPEG-7 descriptors Edge Histogram Descriptor (EHD) Feature vector of EHD consists of three types of bins local – 4 × 4 sub-images × 5 types of edges semi-global – grouping of sub-images in predefined way (horizontal, vertical, ...) global – 1 bin for every type of edges 1 80 81 145 146 150 local semi-global global -------------- --------- D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 40 / 58 MPEG-7 descriptors Bibliography B. S. Manjunath, P. Salembier, and T. Sikora, editors. Introduction to MPEG-7: Multimedia Content Description Interface. Wiley & Sons, Inc., New York, USA, Apr. 2002. D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 41 / 58 Scale-invariant feature transform (SIFT) presented in 2004 (first article in 1999) by David Lowe (see photo) from University of British Columbia (UCB), Canada patented by UCB for commercial purposes local feature extraction (robust to occlusion) similar to human visual system extracting distinctive invariant features D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 42 / 58 Scale-invariant feature transform (SIFT) demonstration of SIFT descriptor finding corresponding parts of the image query image (in the right) is identified as a part of the image in the left D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 43 / 58 Scale-invariant feature transform (SIFT) SIFT consists of key point detection and key point descriptor Key point detection location of the peaks in scale space key point localization orientation assignment Key point descriptor describing the key point as a vector could be used with other key point detections D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 44 / 58 Scale-invariant feature transform (SIFT) Key point detection Key points are derived as local extreme point in scale space of Laplacian-of-Gaussian (LoG) derive LoG with various σ values for each point, compare it in 3 × 3 × 3 neighbourhood (3D image from the scale spaces) if central point is an extreme point (maxima or minima), consider it as a key point D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 45 / 58 Scale-invariant feature transform (SIFT) Key point detection Key points are derived as local extreme points in scale space of Laplacian-of-Gaussian (LoG) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 46 / 58 Scale-invariant feature transform (SIFT) Key point detection Key point localization consists of eliminating outliers (poorly localized along the edges) searching for best scales for all extreme points comparing to some threshold D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 47 / 58 Scale-invariant feature transform (SIFT) Key point detection Orientation assignment to key points to achieve rotation invariance at each point compute central difference (magnitude and direction) for each key point, build the weighted histogram of directions (36 bins =⇒ per 10◦), weights are gradient magnitudes select the peak as the direction of the key point (could be more, within 80% of max peak) any further calculations are done relative to this orientation D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 48 / 58 Scale-invariant feature transform (SIFT) Key point descriptor Extracting of local image descriptors at key points compute relative orientation! and magnitude in 16 × 16 (depicted only 8 × 8) neighbourhood at key point form weighted histogram (8 bins) for 4 × 4 regions concatenate 16 histograms in one vector of 128 dimensions which represents the SIFT feature vector D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 49 / 58 Scale-invariant feature transform (SIFT) Bibliography D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004. Lecture on YouTube Link D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 50 / 58 Zernike Features Zernike polynomials in 2D Vnl (x, y) = n−l 2 m=0 (−1)m (n − m)! m! n−2m+l 2 ! n−2m−l 2 ! x2 + y2 n 2 −m eilθ , where 0 ≤ l ≤ n (n − l) is even θ = tan−1 y x x2 + y2 ≤ 1 individual Vnl are orthogonal. Frederik Zernike (1888-1966) D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 51 / 58 Zernike Features Zernike polynomials in 2D – Examples D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 52 / 58 Zernike Features Definition Let be given an inner product Znl = n + 1 π x y V ∗ nl (x, y)f (x, y), where f (x, y) is an analyzed image a Vnl is a selected Zernike polynomial. Then scalar |Znl | is called a Zernike feature/descriptor. Notice: Znl ∈ C D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 53 / 58 Zernike Features in 3D 3D Zernike polynomial Novotni, M., Klein, R. Shape retrieval using 3D Zernike descriptors, Computer-Aided Design, Volume 36, Issue 11, Solid Modeling Theory and Applications,r 2004, 1047–1062 Grandison, S., Roberts, C., Morris, R. J. The Application of 3D Zernike Moments for the Description of Model-Free Molecular Structure, Functional Motion, and Structural Reliability, Journal of Computational Biology. March 2009, 16(3): 487-500 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 54 / 58 Moment Invariants Definition The 2-D moment of order (p + q) of a digital image f (k, l) of size M × N is defined as: mpq = M−1 k=0 N−1 l=0 kp lq f (k, l) where p = 0, 1, 2, . . . and q = 0, 1, 2, . . . are integers. The central moment of order (p + q) is defined as µpq = M−1 k=0 N−1 l=0 (k − k)p (l − l)q f (k, l) where k = m10 m00 and l = m01 m00 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 55 / 58 Moment Invariants Definition (cont.) The normalized central moments are defined as ηpq = µpq µc 00 where c = p + q 2 + 1 for p + q = 2, 3, . . . Now, let us define several moment invariants that are insensitive to translation scale change mirroring rotation D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 56 / 58 Moment Invariants Seven invariants φ1 = η20 + η02 φ2 = (η20 − η02)2 + 4η2 11 φ3 = (η30 − 3η12)2 + (3η21 − η03)2 φ4 = (η30 + η12)2 + (η21 + η03)2 φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 +(3η21 − η03)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2 φ6 = (η20 − η02) (η30 + η12)2 − (η21 + η03)2 +4η11(η30 + η12)(η21 + η03) φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 +(3η12 − η30)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2 D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 57 / 58 You should know the answers . . . Build your own 10B descriptor for any grayscale image. Explain the maning of individual parts of the feature vector. Explain the way of efficient comparsion of two randomly chosen RGB color images. Describe the construction of so called co-occurrence matrix. How would you observe large scale (spanned over more than 3 pixels) texture details? Why do LBP feature vectors possess histograms with 256 bins? Which way may we compute the mean gradient direction of a selected 4×4 region? Propose an extension of standard Haralick features to work with 3D image data. How would you apply Zernike polynomial to an incoming image of any size so that you could compute the corresponding Zernike feature? D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 58 / 58