Filters in Image Processing
Analysing Images through Visual Descriptors
David Svoboda and Tom´aˇs Majtner
email: svoboda@fi.muni.cz
Centre for Biomedical Image Analysis
Faculty of Informatics, Masaryk University, Brno, CZ
May 14, 2018
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 1 / 58
Outline
1 Motivation
2 Basic idea for image descriptors
3 Image classiﬁcation
4 Most common image descriptors
Haralick features
Local binary patterns (LBP)
MPEG-7 descriptors
Scale-invariant feature transform (SIFT)
Zernike features
Moment invariants
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 2 / 58
Motivation
Unknown image
No meta information
How to determine, what is in
the image?
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 4 / 58
Motivation
Results of a Google search for keyword ‘obama’
(from Nov. 2011)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 5 / 58
Motivation
Results of searching for visually similar images of the oﬃcial photo of
president Obama (from Nov. 2011)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 6 / 58
Basic idea for image descriptors
What are image descriptors?
a smaller (a shorter) form of an image, which encodes some
important image characteristics
this image form is used in image recognition tasks including
comparing images
ﬁnding similar images
distinguish images
Desired properties
fast computation (real-time tasks)
invariance to scale, rotation, and distortion changes
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 8 / 58
Basic idea for image descriptors
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 9 / 58
Image classiﬁcation
Image classiﬁcation
includes a broad range of approaches to the identiﬁcation of images.
analyses the numerical properties of various image features and
organizes data into categories – image classes (clusters).
compares the feature vectors using a chosen metric ⇒ close objects in
feature space are considered visually similar and form clusters.
Image classes may be
speciﬁed a priori by an analyst – supervised classiﬁcation
clustered automatically – unsupervised classiﬁcation
Classiﬁcation algorithms typically employ two phases
training phase – a unique description of each classiﬁcation category
(training class) is created
testing phase – feature-space partitions are used to classify image
features
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 11 / 58
Image classiﬁcation
Most common classiﬁcation methods
Cluster Analysis – unsupervised method k-means clustering
Decision Trees – non-parametric supervised method
Neural Networks – statistical learning algorithms for supervised
classiﬁcation
Support Vector Machine (SVM) – supervised classiﬁcation, very
popular
k-Nearest Neighbours algorithm (k-NN) – simple, non-parametric,
supervised method
Convolutional Neural Networks (CNN) – learning based method
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 12 / 58
Image classiﬁcation
Simple example: feature vector has 2 components
1 Roundness – x-axis
2 # of red pixels – y-axis
Roundness
#ofredpixels
What would be the
feature vector of this
query image?
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 13 / 58
Haralick features
introduced in 1973 by Professor Haralick (see photo)
from City University of New York
popular approach for texture analysis
Haralick features are still used in research
based on so called co-occurrence matrix
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 15 / 58
Haralick features
Co-occurrence matrix
Co-occurrence matrix
is the distribution of co-occurring values at a given oﬀset
mathematically, the co-occurrence matrix C is deﬁned as
C∆x,∆y (i,j)=
n
p=1
m
q=1



1, if I(p, q) = i ∧ I(p + ∆x, q + ∆y) = j
or I(p, q) = i ∧ I(p − ∆x, q − ∆y) = j
0, otherwise
i and j are the image intensity values of the image
p and q are the spatial positions in the n × m image I
the oﬀset (∆x,∆y) depends on the used direction θ and the distance
d at which the matrix is computed
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 16 / 58
Haralick features
Co-occurrence matrix
(∆x,∆y) represents the separation vector
4 orientations are usually considered
horizontal – separation vector (1, 0) for distance 1
vertical – separation vector (0, 1) for distance 1
main diagonal – separation vector (1, 1) for distance 1
minor diagonal – separation vector (1, −1) for distance 1
0 3 3
0 0 1
2 2 1
#(0, 0) #(0, 1) #(0, 2) #(0, 3)
#(1, 0) #(1, 1) #(1, 2) #(1, 3)
#(2, 0) #(2, 1) #(2, 2) #(2, 3)
#(3, 0) #(3, 1) #(3, 2) #(3, 3)
Original image I
General form of co-occurrence matrix for image I
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 17 / 58
Haralick features
Co-occurrence matrix
2 1 0 1
1 0 1 0
0 1 2 0
1 0 0 2
C1, 0 =
2 0 2 1
0 2 0 1
2 0 0 0
1 1 0 0
C0, 1 =
2 1 1 0
1 0 0 1
1 0 0 0
0 1 0 0
C1, 1 =
0 0 1 2
0 0 1 0
1 1 0 0
2 0 0 0
C1, -1 =
0 3 3
0 0 1
2 2 1
Original image I
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 18 / 58
Haralick features
Co-occurrence matrix
because simple 8-bit images could have 256 intensity values,
corresponding co-occurrence matrices will be very large
solution is to use quantization prior to the extraction process
co-occurrence matrices are in the end normalized and averaged to
form the ﬁnal co-occurrence matrix C
Note: All co-occurrence matrices are symmetric (why?)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 19 / 58
Haralick features
Haralick suggested 14 features that could be derived from the matrix and
form the feature vector of Haralick features
entropy: −
q
i=1
q
j=1
C(i, j) log C(i, j)
texture correlation:
q
i=1
q
j=1
|i − j|C(i, j)
texture homogeneity:
q
i=1
q
j=1
C(i, j)
1 + |i − j|
and the others ... (q is the maximal intensity present in the image)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 20 / 58
Haralick features
Bibliography
R. M. Haralick, K. Shanmugam, and I. Dinstein. Textural Features
for Image Classiﬁcation. IEEE Trans. on Systems, Man and Cyber.,
SMC-3(6):610–621, 1973.
L. Tesaˇr, D. Smutek, A. Shimizu, and H. Kobatake. 3D Extension of
Haralick Texture Features for Medical Image Analysis. In Proceedings
of the Fourth IASTED International Conference on Signal Processing,
Pattern Recognition, and Applications, SPPRA ’07, pages 350–355.
ACTA Press, 2007.
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 21 / 58
Local binary patterns (LBP)
introduced in 1994 by Ojala (upper photo) and
Pietik¨ainen (lower photo) from University of Oulu,
Finland
descriptor became famous after generalization in 2002
originally proposed for face recognition
currently used also in (bio)medical image analysis,
motion analysis, eye localization, ﬁngerprint
recognition, and many others
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 22 / 58
Local binary patterns (LBP)
Original approach (1994)
Idea: Texture can be described by the pattern and its strength
LBP pattern
1 each pixel is compared with its 8 neighbours
2 if the intensity value of neighbouring pixel is greater than or equal to
the value of examined pixel’s intensity, write 1 (otherwise, write 0)
7 1 9
2 5 5
6 0 3
1 0 1
0 1
1 0 0
Example Threshold
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 23 / 58
Local binary patterns (LBP)
Original approach (1994)
Idea: Texture can be described by the pattern and its strength
LBP pattern
3 take the digits from top-left corner in clockwise order and interpret
them as decimal number
4 this decimal number represents the pattern
7 1 9
2 5 5
6 0 3
1 0 1
0 1
1 0 0
Example Threshold
Binary: 10110010
LBP = 178
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 24 / 58
Local binary patterns (LBP)
Original approach (1994)
Idea: Texture can be described by the pattern and its strength
Strength of the pattern
5 decimals from entire image are used to form histogram
(256 bins – why?)
6 concatenation of the normalized histogram values gives us the feature
vector
7 1 9
2 5 5
6 0 3
1 0 1
0 1
1 0 0
Example Threshold
Binary: 10110010
LBP = 178
-----
Feature vector
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 25 / 58
Local binary patterns (LBP)
Generalization of LBP (2002)
Idea: No limitation to the size of the neighbourhood and the number of
sampling points
parameter P - number of sampling points
parameter R - size of the neighbourhood
P = 8, R = 1 P = 8, R = 2 P = 4, R = 2
when the sampling point is not in the centre of the pixel, bilinear
interpolation is used
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 26 / 58
Local binary patterns (LBP)
LBP descriptor has many variants and modiﬁcations
Median binary patterns – thresholding against the median within the
neighbourhood
Local ternary patterns – solving problem of nearly constant areas
and the others . . .
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 27 / 58
Local binary patterns (LBP)
Bibliography
T. Ojala, M. Pietik¨ainen, and D. Harwood. Performance evaluation of
texture measures with classiﬁcation based on Kullback discrimination
of distributions. In 12th IAPR Intern. Conf. on Patt. Recog. Vol. 1 Conf.
A: Computer Vision and Image Processing, pages 582–585,
Oct. 1994.
T. Ojala, M. Pietik¨ainen, and T. Maenpaa. Multiresolution gray-scale
and rotation invariant texture classiﬁcation with local binary patterns.
IEEE Trans. on Pattern Analysis and Machine Intelligence,
24(7):971–987, July 2002.
M. Pietik¨ainen, A. Hadid, G. Zhao, and T. Ahonen. Computer Vision
Using Local Binary Patterns. Computational imaging and vision.
Springer Verlag, London, 2011.
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 28 / 58
MPEG in general
Motion Picture Experts Group (MPEG) – developed digital
audiovisual compression standards (in 1988)
MPEG-1 (1993) – the ﬁrst standard for audio and video MP3
MPEG-2 (1995) – generic coding of moving pictures and associated
audio information
MPEG-4 (1998) – coding of audio-visual objects
MPEG-7 (2002) – multimedia content description interface (including
Visual descriptors)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 29 / 58
MPEG-7 descriptors
part of MPEG-7 visual standard
standardized low-level descriptors for diﬀerent domains
many contributors, joining editor B. S. Manjunath (see
photo)
ﬁrst public release in 2002
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 30 / 58
MPEG-7 descriptors
Division
MPEG-7 visual descriptor are divided to 4 groups
Colour descriptors – robust to viewing angle, translation, and rotation
of the regions of interest (ROI),
6 features are included here
Texture descriptors – contain important structural information of
intensity variations and their relationship to the surrounding
environment, 3 features are included here
Shape descriptors – techniques for describing and matching shape
features of 2D and 3D, 3 features are included here
Motion descriptors – description of motion features in video
sequences, 4 features are included here
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 31 / 58
MPEG-7 descriptors
Texture descriptors
MPEG-7 texture descriptors consist of three feature extractors
Homogeneous Texture Descriptor (HTD) – characterizes the region
texture using the mean energy and the energy deviation from the set
of frequency channels
Texture Browsing Descriptor (TBD) – speciﬁes the perceptual
characterization of the texture, which is similar to human perception
Edge Histogram Descriptor (EHD) – spatial distribution of edges in
the image
Notice: We will brieﬂy describe HTD and EHD.
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 32 / 58
MPEG-7 descriptors
Homogeneous Texture Descriptor (HTD)
2D frequency plane is partitioned into 30 channels
partitioning uniform along the angular direction and not uniform
along the radial direction (in octave scale)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 33 / 58
MPEG-7 descriptors
Homogeneous Texture Descriptor (HTD) – Gabor ﬁlters
The individual channels are convolved using Gabor ﬁlters
introduced in 1946 by Dennis Gabor (see photo) for
1D signal
the ﬁlter is obtained by modulating a sinusoid with a
Gaussian function
it responds to some frequency in a localized part of the
signal
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 34 / 58
MPEG-7 descriptors
Homogeneous Texture Descriptor (HTD) – Gabor ﬁlters
Extension of Gabor ﬁlters to 2D
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 35 / 58
MPEG-7 descriptors
Homogeneous Texture Descriptor (HTD) – Gabor ﬁlters
The (s, r)-th channel, where s is radial index and r is angular index, is
modelled in frequency domain as
Gs,r (ω, θ) = exp −(ω−ωs )2
2σ2
s
.exp −(θ−θr )2
2τ2
r
σs and τr are standard deviation of the Gaussian in radial and angular
direction, respectively
θr = 30◦ × r, where r ∈ {0, 1, 2, 3, 4, 5}
ωs = ω0 × 2−s, where s ∈ {0, 1, 2, 3, 4} and ω0 is the highest
frequency
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 36 / 58
MPEG-7 descriptors
Homogeneous Texture Descriptor (HTD)
The syntax of the HTD is as follows:
HTD = [fDC , fSD, e1, e2, ..., e30, d1, d2, ..., d30]
fDC is the mean of the image
fSD is the standard deviation of the image
ei and di are non-linearly scaled and quantized mean and standard
deviation of the ith channel (i ∈ {1, 2, .., 30})
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 37 / 58
MPEG-7 descriptors
Edge Histogram Descriptor (EHD)
EHD represents the local edge distribution in the image
divide image space in 4 × 4 sub-images
each sub-image divided into non-overlapping squared image blocks
(1100 image blocks)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 38 / 58
MPEG-7 descriptors
Edge Histogram Descriptor (EHD)
EHD represents the local edge distribution in the image
each image block is classiﬁed into one of the 5 edge categories or as
non-edge block
classiﬁcation is done by applying corresponding edge detector and
thresholding
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 39 / 58
MPEG-7 descriptors
Edge Histogram Descriptor (EHD)
Feature vector of EHD consists of three types of bins
local – 4 × 4 sub-images × 5 types of edges
semi-global – grouping of sub-images in predeﬁned way (horizontal,
vertical, ...)
global – 1 bin for every type of edges
1 80 81 145 146 150
local semi-global global
-------------- ---------
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 40 / 58
MPEG-7 descriptors
Bibliography
B. S. Manjunath, P. Salembier, and T. Sikora, editors. Introduction
to MPEG-7: Multimedia Content Description Interface. Wiley &
Sons, Inc., New York, USA, Apr. 2002.
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 41 / 58
Scale-invariant feature transform (SIFT)
presented in 2004 (ﬁrst article in 1999) by David Lowe
(see photo) from University of British Columbia
(UCB), Canada
patented by UCB for commercial purposes
local feature extraction (robust to occlusion)
similar to human visual system
extracting distinctive invariant features
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 42 / 58
Scale-invariant feature transform (SIFT)
demonstration of SIFT descriptor
ﬁnding corresponding parts of the image
query image (in the right) is identiﬁed as a part of the image in the
left
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 43 / 58
Scale-invariant feature transform (SIFT)
SIFT consists of key point detection and key point descriptor
Key point detection
location of the peaks in scale space
key point localization
orientation assignment
Key point descriptor
describing the key point as a vector
could be used with other key point detections
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 44 / 58
Scale-invariant feature transform (SIFT)
Key point detection
Key points are derived as local extreme point in scale space of
Laplacian-of-Gaussian (LoG)
derive LoG with various σ values
for each point, compare it in 3 × 3 × 3 neighbourhood
(3D image from the scale spaces)
if central point is an extreme point (maxima or minima), consider it
as a key point
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 45 / 58
Scale-invariant feature transform (SIFT)
Key point detection
Key points are derived as local extreme points in scale space of
Laplacian-of-Gaussian (LoG)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 46 / 58
Scale-invariant feature transform (SIFT)
Key point detection
Key point localization consists of
eliminating outliers (poorly localized along the edges)
searching for best scales for all extreme points
comparing to some threshold
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 47 / 58
Scale-invariant feature transform (SIFT)
Key point detection
Orientation assignment to key points
to achieve rotation invariance
at each point compute central diﬀerence
(magnitude and direction)
for each key point, build the weighted histogram of directions
(36 bins =⇒ per 10◦), weights are gradient magnitudes
select the peak as the direction of the key point (could be more,
within 80% of max peak)
any further calculations are done relative to this orientation
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 48 / 58
Scale-invariant feature transform (SIFT)
Key point descriptor
Extracting of local image descriptors at key points
compute relative orientation! and magnitude in 16 × 16 (depicted
only 8 × 8) neighbourhood at key point
form weighted histogram (8 bins) for 4 × 4 regions
concatenate 16 histograms in one vector of 128 dimensions which
represents the SIFT feature vector
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 49 / 58
Scale-invariant feature transform (SIFT)
Bibliography
D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints.
International Journal of Computer Vision, 2004.
Lecture on YouTube Link
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 50 / 58
Zernike Features
Zernike polynomials in 2D
Vnl (x, y) =
n−l
2
m=0
(−1)m (n − m)!
m! n−2m+l
2 ! n−2m−l
2 !
x2
+ y2
n
2
−m
eilθ
,
where
0 ≤ l ≤ n
(n − l) is even
θ = tan−1 y
x
x2 + y2 ≤ 1
individual Vnl are orthogonal.
Frederik Zernike (1888-1966)
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 51 / 58
Zernike Features
Zernike polynomials in 2D – Examples
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 52 / 58
Zernike Features
Deﬁnition
Let be given an inner product
Znl =
n + 1
π x y
V ∗
nl (x, y)f (x, y),
where
f (x, y) is an analyzed image a
Vnl is a selected Zernike polynomial.
Then scalar |Znl | is called a Zernike feature/descriptor.
Notice: Znl ∈ C
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 53 / 58
Zernike Features in 3D
3D Zernike polynomial
Novotni, M., Klein, R. Shape retrieval
using 3D Zernike descriptors,
Computer-Aided Design, Volume 36,
Issue 11, Solid Modeling Theory and
Applications,r 2004, 1047–1062
Grandison, S., Roberts, C., Morris, R. J.
The Application of 3D Zernike Moments
for the Description of Model-Free
Molecular Structure, Functional Motion,
and Structural Reliability, Journal of
Computational Biology. March 2009,
16(3): 487-500
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 54 / 58
Moment Invariants
Deﬁnition
The 2-D moment of order (p + q) of a digital image f (k, l) of size
M × N is deﬁned as:
mpq =
M−1
k=0
N−1
l=0
kp
lq
f (k, l)
where p = 0, 1, 2, . . . and q = 0, 1, 2, . . . are integers.
The central moment of order (p + q) is deﬁned as
µpq =
M−1
k=0
N−1
l=0
(k − k)p
(l − l)q
f (k, l)
where
k =
m10
m00
and l =
m01
m00
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 55 / 58
Moment Invariants
Deﬁnition (cont.)
The normalized central moments are deﬁned as
ηpq =
µpq
µc
00
where
c =
p + q
2
+ 1 for p + q = 2, 3, . . .
Now, let us deﬁne several moment invariants that are insensitive to
translation
scale
change
mirroring
rotation
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 56 / 58
Moment Invariants
Seven invariants
φ1 = η20 + η02
φ2 = (η20 − η02)2
+ 4η2
11
φ3 = (η30 − 3η12)2
+ (3η21 − η03)2
φ4 = (η30 + η12)2
+ (η21 + η03)2
φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2
− 3(η21 + η03)2
+(3η21 − η03)(η21 + η03) 3(η30 + η12)2
− (η21 + η03)2
φ6 = (η20 − η02) (η30 + η12)2
− (η21 + η03)2
+4η11(η30 + η12)(η21 + η03)
φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2
− 3(η21 + η03)2
+(3η12 − η30)(η21 + η03) 3(η30 + η12)2
− (η21 + η03)2
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 57 / 58
You should know the answers . . .
Build your own 10B descriptor for any grayscale image. Explain the
maning of individual parts of the feature vector.
Explain the way of eﬃcient comparsion of two randomly chosen RGB
color images.
Describe the construction of so called co-occurrence matrix. How
would you observe large scale (spanned over more than 3 pixels)
texture details?
Why do LBP feature vectors possess histograms with 256 bins?
Which way may we compute the mean gradient direction of a selected
4×4 region?
Propose an extension of standard Haralick features to work with 3D
image data.
How would you apply Zernike polynomial to an incoming image of any
size so that you could compute the corresponding Zernike feature?
D. Svoboda and T. Majtner (CBIA@FI) Filters in Image Processing spring 2018 58 / 58