Similarity Search for Multimedia Data
Pavel Zezula
DISA FI Masaryk University
Brno, Czech Republic

Outline of the talk
•Similarity in our lives
•Principles of metric similarity search technology
–effectiveness – metric similarity features
–efficiency – indexing structure
•Examples of multimedia applications:
–Image search and annotation
–Processing streams of similarity queries
–Searching in motion capture data
–Accessing protein databases
•Future research directions
–

Similarity in the Real-Life
quotations from the social psychology literature
•Any event in the history of organism is, in a sense, unique.
•
•Recognition, learning, and judgment presuppose an ability to categorize stimuli and classify
situations by similarity.
•
•Similarity (proximity, resemblance, communality, representativeness, psychological distance, ...)
is fundamental to theories of perception, learning,  judgment, etc.
•
•Similarity is subjective and context-dependent

Real-life Similarity
•Are they similar?
http://html.slidesharecdn.com/similarity-120222193728-phpapp02/output009.png?1329961814

Real-life Similarity
•Are they similar?
http://html.slidesharecdn.com/similarity-120222193728-phpapp02/output001.png?1329961814

Real-life Similarity
•Are they similar?
http://html.slidesharecdn.com/similarity-120222193728-phpapp02/output004.png?1329961814

Real-life Similarity
•Are they similar?
http://html.slidesharecdn.com/similarity-120222193728-phpapp02/output010.png?1329961814

Prototypicality or Centrality
not symmetric


Context/Data/Environment Dependent
circumstances alter similarities


We learned from School
•GEOMETRY:
•Two polygons are similar to each other, if:
1)Their corresponding angles are congruent
•∠A = ∠E; ∠B = ∠F; ∠C = ∠G; ∠D = ∠H, and
2)The lengths of their corresponding sides are proportional
•AB/EF = BC/FG = CD/GH = DA/HE
•
B
C
A
D
E
F
H
G

Similarity & Geometry
•If one polygon is similar to a second polygon, and the second polygon is similar to the third
polygon, the first polygon is also similar to the third polygon.
•In any case:
•
•Two geometric figures are either similar or they are not similar at all
•

Contemporary Networked Media
The digital data view
•Almost everything that we see, read, hear, write, measure, or observe can be digital.
•Users autonomously contribute to production of global media and the growth is exponential.
•Sites like Flickr, YouTube, Facebook host user contributed content for a variety of events.
•The elements of networked media are related by numerous multi-facet links of similarity.
•
•
•Majority of current data is unstructured
•possibly only structured on display

Challenge
•Networked media database is getting close to the human “fact-bases”
–the gap between physical and digital world has blurred
•
•Similarity data management is needed to connect, search, filter, merge, relate, rank, cluster,
classify, identify, or categorize objects across various collections.
•
•WHY?
•It is the similarity which is in the world revealing.

Iterative and Interactive
Nature of Contemporary Searching
•
•When we search, our next actions are reactions to the stimuli of previous search results
•
•What we find is changing what we seek
•
•In any case, search must be:
•
• fast, simple, and relevant

Metric Space:
A Geometric Model of Similarity
•Metric space: M = (D,d)
–D – domain
–distance function d(x,y)
•"x,y,z Î D
•d(x,y) > 0 - non-negativity
•d(x,y) = 0  Û  x = y - identity
•d(x,y) = d(y,x) - symmetry
•d(x,y) ≤ d(x,z) + d(z,y) - triangle inequality

Examples of Distance Functions
•Lp Minkovski distance (for vectors)
•L1 – city-block distance
»
•L2 – Euclidean distance
•
•L¥ – infinity
»
•Edit distance (for strings)
•minimal number of insertions, deletions and substitutions
•d(‘application’, ‘applet’) = 6
»
•Jaccard’s coefficient (for sets A,B)
•

Examples of Distance Functions
•Mahalanobis distance
–for vectors with correlated dimensions
•Hausdorff distance
–for sets with elements related by another distance
•Earth movers distance
–primarily for histograms (sets of weighted features)
•and many others
–

Content-Based Search
•Content-based search in images
1806 1040158 1045791 984761 1042473
Image base

Extracting Features
•Extracting features
1806
Image level
R
B
G
Feature level

MPEG-7
•Multimedia Content Descriptors Standard ~ 2000
•Global feature descriptors:
–Color, shape, texture, …
–
–
–
–
–One high-dimensional vector per image and feature
–Minkovski distance used
•
–

Visual Similarity
- Local feature descriptors – SIFT, SURF, etc.
- Invariant to image scaling, small viewpoint change, rotation, noise, illumination
IMG_2088_sifts_thick

Visual Similarity - finding coresspondence


Biometrics: Fingerprint
•Minutiae detection:
–Detect ridges (endings and branching)
–Represented as a sequence of minutiae
•P=( (r1,e1,θ1), …, (rm,em,θm) )
•Point in polar coordinates (r,e) and direction θ
•Matching of two sequences:
–Align input sequence with database one
–Compute weighted edit distance
•wins,del=620
•wrepl=[0;26]  - depending on similarity of two minutiae
fingerprint1.png fingerprint1.png

Points in the sequence are ordered in an increasing order of radial angle (e).


Multiple Visual Aspects
•
tovarna

Contemporary Approaches to
Feature Extraction
•Neural networks technology
–Convolutional Neural Networks (CNN)
–Recurrent Neural Networks (RNN)
Classified
dataset
Training data
Validation data
VÃ½sledek obrÃ¡zku pro neural network
Training (Fine-tuning)
Validation
Neural network model
Data split

Convolutional Neural Networks
•Convolutional Neural Network (CNN) – AlexNet
•The last layer with 1,000 output categories
•Output of any layer can be used as a feature
VÃ½sledek obrÃ¡zku pro convolutional neural network

-It contains 5 convolutional layers and 3 fully connected layers. Relu is applied after very
convolutional and fully connected layer. Dropout is applied before the first and the second fully
connected year.
-Convolutional layer is a feature detector that automatically learns to filter out not needed
information from an input by using convolutional kernel.
-Pooling layers compute the max or average value of a particular feature over a region of the input
data (downsizing of input images) and also help to detect objects in some unusual places and reduce
memory size.
-The network has 62.3 million parameters (and 650,000 neurons), and needs 1.1 billion computation
units in a forward pass. We can also see convolution layers, which accounts for 6% of all the
parameters, consumes 95% of the computation. It takes five to six days to train on two GTX 580 3GB
GPUs.
-The network uses Relu instead of Tanh to add non-linearity. It accelerates the speed by 6 times at
the same accuracy.

Recurrent Neural Networks
•Recurrent neural networks (RNN)
•Long-Short Term Memory (LSTM) networks:
–Learn when data should be remembered and when they should be thrown away
–Well-suited to learn from experience to classify, process and predict time series when there are
very long time lags of unknown size between important events
–
–
SouvisejÃcÃ obrÃ¡zek

-Provide operations for reading, writing and resetting

Deep Learning Summary
•
•
•It is no magic! Just statistics in a black box, but exceptional effective at learning patterns,
provided powerful computational infrastructure can be applied  - GPU cards
•
•
•Can be used not only for classification but is also able to provide content preserving feature
vectors. •When calibrated by an Lp distance good quality similarity estimates can be obtained.

Machines that learn to represent the world from experience

Similarity Search Problem
•For X ÍD in metric space M,
•pre-process X so that the similarity queries
•are executed efficiently.
•
•Implementation problems:
-How to partition the data to reduce search space
-How to ask questions -  definition of queries
–- How to execute queries – to achieve performance
•The challenge:
–In metric space, no total ordering exists!
•

Basic Partitioning Principles
•Given a set X Í D  in M=(D,d), basic partitioning principles have been defined:
–
–Ball partitioning
–Generalized hyper-plane partitioning
•
•
•Note:
–Some special cases, such as Euclidian or Supermetric spaces are more tractable
–

Ball Partitioning
•Inner set:  { x Î X  | d(p,x) ≤ dm }
•Outer set: { x Î X | d(p,x) > dm }
•
p
dm

Generalized Hyper-plane
•{ x Î X | d(p1,x) ≤ d(p2,x) }
•{ x Î X | d(p1,x) > d(p2,x) }
•
p2
p1

Similarity Range Query
•
•
•
•range query
–R(q,r) = { x Î X | d(q,x) ≤ r }
•
•… all museums up to 2km from my hotel …
r
q

Nearest Neighbor Query
•the nearest neighbor query
–NN(q) = x
–x Î X, "y Î X, d(q,x) ≤ d(q,y)
•
•k-nearest neighbor query
–k-NN(q,k) = A
–A Í X, |A| = k
–"x Î A, y Î X – A, d(q,x) ≤ d(q,y)
•
•
•
•… five closest museums to my hotel …
q
k=5

Reverse Nearest Neighbor
•
•
•… all hotels with a specific museum as a nearest cultural heritage cite …

Example of 2-RNN
•Objects o4, o5, and o6 have q between their two nearest neighbor.
o5
q
o4
o6
o1
o2
o3

Similarity Joins
•similarity join of two data sets
–
–
–
•
•similarity self join ó X = Y
•
•…pairs of hotels and museums
•which are five minutes walk apart …
m

Pivot Filtering
•Idea:
•Given R(q,r) use triangle inequality for pruning
•All distances between objects and a pivot p are known
•Prune object o Î X  if any holds
–d(p,o) < d(p,q) – r
–d(p,o) > d(p,q) + r
q
r
p

Generalized concept of the object-pivot constraint.

Pivot Filtering (cont.)
•Filtering with two pivots
–Only Objects in the dark blue
– region have to be checked.
–
–Effectiveness is improved
– using more pivots.
•
p1
p2
q
r
o1
o2
o3

Pivot Filtering (summary)
•Given a metric space M=(D,d) and a set of pivots
• P = { p1, p2, p3,…, pn }. We define a mapping function Y: (D,d) ® (Rn,L∞) as follows:
•
• Y(o) = (d(o,p1), …, d(o,pn))
•
• Then, we can bound the distance d(q,o) from below:
•
• L∞(Y(o), Y(q)) ≤ d(q,o)

Angles formed by triplets of descriptors
•Any 3 metric objects can be isometrically embedded into 2D Euclidean space to form the triangle
•Angles within triangles are defined (by the Cosine rule)
•Angles formed by real-life descriptors are from limited range

The M-tree [Ciaccia, Patella, Zezula, VLDB 1997]
•
•1) Paged organization
•2) Dynamic
•3) Suitable for arbitrary metric spaces
•4) I/O and CPU optimization
–- computing d can be time-consuming

The M-tree Idea
•Depending on the metric, the “shape” of index regions changes
C D E F
A  B
B
F
D
E
A
C
Metric: L2 (Euclidean)
L1 (city-block)
L¥ (max-metric)
weighted-Euclidean
quadratic form

o7
M-tree: Example
o1
o6
o10
o3
o2
o5
o4
o9
o8
o11
o1
4.5
-.-
o2
6.9
-.-
o1
1.4
0.0
o10
1.2
3.3
o7
1.3
3.8
o2
2.9
0.0
o4
1.6
5.3
o2
0.0
o8
2.9
o1
0.0
o6
1.4
o10
0.0
o3
1.2
o7
0.0
o5
1.3
o11
1.0
o4
0.0
o9
1.6
Covering radius
Distance to parent
Distance to parent
Distance to parent
Distance to parent
Leaf entries

M-tree: Range Search
•Given R(q,r):
•Traverse the tree in a depth-first manner
•In an internal node, for each entry áp,rc,d(p,pp),ptrñ
–Prune the subtree if |d(q,pp) – d(p,pp)| – rc > r
–Application of the pivot-pivot constraint
q
q
r
p
rc
pp
r
p
rc
pp

M-tree: Range Search (cont.)
•If not discarded, compute d(q,p) and
–Prune the subtree if d(q,p) – rc > r
–Application of the range-pivot constraint
–
–
–
–
–
•All non-pruned entries are searched recursively.
q
p
rc
r

D-Index [Dohnal, Gennaro, Zezula, MTA 2002]
4 separable buckets at the first level
2 separable buckets at the second level
exclusion bucket of the whole structure

D-index: Insertion


D-index: Range Search
q
r
q
r
q
r
q
r
q
r
q
r

Pivot Permutation Approach
•Assume a set n of pivots {p1, p2, ... pn}
•
•Given object x in X, order the pivots according to d(x, pi)
•
•Let ∏x be a permutation on the set of pivot indexes {1, … , n} such that ∏x(j) is index of the
j-th closest pivot from x
–e.g., ∏x(1) is the index of the closest pivot to x
–p∏x(j) is the j-th closets pivot from x
–
•∏x is denoted as Pivot Permutation (PP) with respect to x.

Pivot Permutation Approach
•Can be seen as a recursive Voronoi partitioning to level l
•
•Cell C(i1, … il) contains objects x for which:
•
•∏x(1) = i1, ∏x (2) = i2, … , ∏x (l)=il

Metric Sketches for
Fast Searching and Filtering
•Transformation of any metric space (D,d) to the Hamming space
–Each descriptor o is transformed into a bit-string sketch of length λ. Hamming distance evaluates
the number of different bits – very efficient, hardware instruction
•Sketches compared by the Hamming distance should approximate similarity relationships of the
original metric
•Typical sketch length is 32 – 320 bits

Sketching Transformation
•GHP to set one bit of sk(o)
•
•GHP to set bits  μ and ν of sk(o)
•

Properties and Application
•
•Good sketches (choosing pivots):
–Balanced split
–Low correlation
•Application – even better for filtering than searching

Textbooks on
Metric Searching technology
book Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in
Computer Graphics)
Hanan Samet
Foundation of Multidimensional and
Metric Data Structures
Morgan Kaufmann, 2006
P. Zezula, G. Amato, V. Dohnal, and M. Batko
Similarity Search: The Metric Space Approach
Springer, 2005
Teaching material: http://www.nmis.isti.cnr.it/amato/similarity-search-book/

[USEMAP]
Infrastructure Independence: MESSIF
Metric Similarity Search Implementation Framework
Metric space (D,d)
Operations
Storage
Centralized index structures
Distributed index structures
Communication
Net
Vectors
• Lp and quadratic form
Strings
• (weighted) edit and
  protein sequence
Insert, delete,
range query,
k-NN query,
Incremental k-NN
Volatile memory
Persistent memory
Performance statistics

Similarity Search Demos
•20M images: http://disa.fi.muni.cz/demos/profiset-decaf/
•Fashion: http://disa.fi.muni.cz/twenga/
•Image annotation: Image annotation: http://disa.fi.muni.cz/annotation-ui/
•
•Fingerprints: http://disa.fi.muni.cz/fingerprints/
•Time series: http://disa.fi.muni.cz/subseq/
•Multi-modal person ident.: http://disa.fi.muni.cz/mmpi

http://disa.fi.muni.cz/FaceMatch/image/215920 http://disa.fi.muni.cz/FaceMatch/image/215920
Preprocessing
Retrieval
http://disa.fi.muni.cz/FaceMatch/image/215920
Face detection with several technologies
Merge of detected faces
<13.9, 9.5, -6.0, 712.1, …>
<17.9, 12.1, -9.1, 692.0, …>
<8.8, 7.7, -3.5, 570.8, …>
<14.4, 8.2, -8.4, 704.0, …>
<10.1, 5.8, 40.6, 99.6, …>
<5.4, 1.2, -60.4, 88.0, …>
<45.1, 64.8, 90.6, 78.6, …>
Face description with several technologies
Similarity Search in Collections of Faces
http://upload.wikimedia.org/wikipedia/commons/8/88/Cameron_Diaz_WE_2012_Shankbone_3.JPG
<13.9, 9.5, -6.0, 712.1, …>
<10.6, 78.9, -45.6, 101.3, …>
0.12
0.17
0.18
0.23
Fused features DB
Features indexing by one technique
>
Face Detection
Features Extraction
Candidates
filtering
Index
Fused features saving
Candidate
faces
Query image

Fused Face Detection and Face Matching
•Fused face detection:
•Faces detected by more technologies are taken into account
•Showcase: 3 technologies, compliance of at least two:
•
•Fused face matching:
•Characteristic features from more technologies are available for each face
•Similarity of two faces evaluated by each technology is normalized into interval [0, 1]
•Normalized value expresses a probability that faces belong to the same person
•Highest probability is used to determine the similarity of faces
Software name
OpenCV
Luxand
Verilook
Compliance
of at least 2
Recall / precision (%)
55 / 89
64 / 83
73 / 83
64 / 96

Face Matching Results, Relevance Feedback
•User may improve results by marking correctly found faces in several iterations:
00788_941205_qr00.jpg 00816_960530_rb00.jpg 00825_940307_fa00.jpg 00825_940307_fb00.jpg
00825_940307_hl00.jpg 00825_940307_pl00.jpg 00846_940307_hr_a00.jpg 00717_960530_fb00.jpg
query
00788_941205_qr00.jpg 00816_960530_rb00.jpg 00788_941205_qr00.jpg 00825_940307_fa00.jpg
00825_940307_fb00.jpg 00825_940307_hl00.jpg 00825_940307_hr00.jpg 00825_940307_pl00.jpg
00846_940307_hr_a00.jpg 00717_960530_fb00.jpg
query
00825_940307_hl00.jpg 00825_940307_pl00.jpg 00825_940307_hl00.jpg 00825_940307_hr00.jpg
00825_940307_hl00.jpg 00846_940307_hr_a00.jpg
2nd iteration
1st iteration

Slide ‹#›
Search-based Image Annotation
§
§We already have a strong tool – the similarity search
§For any input image, we can retrieve visually similar images
§Metadata of the similar images can be used to describe the original image
§
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRSpxVqEaSo5HgnpQLjkP44gtENLw8eHm8jhsO6LnM98BS
eIzNt
§Keyword-based image retrieval
§Popular and intuitive
§Needs pictures with text metadata
§Manual annotation is expensive
?
Need for automatic image annotation
Search-based image annotation

Slide ‹#›
Search-based annotation principles
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRSpxVqEaSo5HgnpQLjkP44gtENLw8eHm8jhsO6LnM98BS
eIzNt
Annotated image collection
Content-based
image retrieval
Similar annotated images
Yellow, bloom, pretty
Meadow, outdoors, dandelion
Mary’s garden, summer
Candidate
keyword
processing
Semantic resources
Final candidate keywords
with probabilities
Plant 0.3
Flower 0.3
Garden 0.15
Sun 0.05
Human 0.1
Park 0.1
d = 0.2
d = 0.6
d = 0.5
?

Slide ‹#›
Example
http://disa.fi.muni.cz/profimedia/imagesJpeg/0077128591
Candidate keywords after CBIR
church, architecture, travel, europe, building, religion, germany, buildings, north, churches,
christianity, america, religious, exterior, st, historic, world, tourism, united, usa, …
1.Retrieve 100 similar images from Profiset
2.Merge their keywords, compute frequencies
3.Build the semantic network using WordNet
4.Compute the ConceptRank
5.Apply postprocessing & return 20 most probable keywords
ConceptRank scores
building (2.53), structure (2.41), LANDSCAPE (2.10), BUILDINGS (1.87), OBJECT (1.84), NATURE
(1.78), place_of_worship (1.75), church (1.74), Europe (1.68), religion (1.64), continent (1.51), …
Final keywords
building, structure, church, religion, continent, group, travel, island, sky, architecture, tower,
person, belief, locations, chapel, christianity, tourism, regions, country, district
Semantic network
4 relationships: hypernym (dog → animal), hyponym (animal → dog), meronym (leaf → tree), holonym
(tree → leaf)
270 network nodes, 471 edges

Slide ‹#›
Annotations in use
§Participation in the ImageCLEF 2014 Scalable Annotation Challenge
§2nd place, mean average precision of annotation approx. 60 %
§
§Web demo & Mozilla addon
§Image annotation: http://disa.fi.muni.cz/annotation-ui/
§
§
§
§
§
§
§
§
§
§
§

Similarity Search in Streams
▪
▪Two basic approaches to explore  data:
▪Store, pre-process and search later, database processing
▪Process (filter) continuously, stream processing
▪
▪Examples of stream processing applications:
▪Surveillance camera and event detection
▪Mail stream and spam filter
▪Publish/subscribe applications

Stream Processing Scenarios
▪Stream: potentially infinite sequence of data items (d1, d2, …) – tuples, images, frames, etc.
▪Basic scenarios:
▪Data items processed immediately, possible data item skipping
→ minimize delay - e.g., event detection
▪Process everything as fast as possible, delay possible to maximize throughput -  our focus
▪Motivating examples with similarity searching
▪Image annotation – annotate a stream of images collected by a web crawler
▪Publish/subscribe applications – categorize a stream of documents by similarity searching

Processing Streams of Query Objects
▪Typical large-scale similarity search approach:
▪partitioned data stored on a disk
▪partition reads from a disk form the bottleneck
▪Idea: similar queries need similar sets of partitions → save accesses
▪Buffer: memory used for reordering (clustering) queries
▪Cache: memory containing previously read data partitions
Disk
Buffer
Query
Cache
Query
Stream
Result
Metric index

Experiment Results
▪100,000 processed 10-NN queries
▪DB: 1 mil. MPEG-7 descriptors
▪Buffer capacity: 8,000 queries
▪Cache size: 40,000 objects (4% of the DB)
▪100,000 processed 10-NN queries
▪DB: 10 mil. MPEG-7 descriptors
▪Buffer capacity: 10,000 queries
▪Cache size: 90,000 objects (0.9% of the DB)

Continuous kNN Join
Time window
Past
Future
Users‘ preferences
(query objects)
3NN join
▪Task: efficiently evaluate kNN join at any time
▪Challenge: efficient join computation having a lot of query objects and a lot of stream objects in
the time window
▪General approach: update the join result continuously
▪Problem: a lot of time-consuming metric distance computations
▪Approximation technique proposal
▪Sketch-based filter
▪Hamming distances

Continuous kNN Join
Time window
Past
Future
Users‘ preferences
(query objects)
3NN join
▪Task: efficiently evaluate kNN join at any time
▪Challenge: efficient join computation having a lot of query objects and a lot of stream objects in
the time window
▪General approach: update the join result continuously
▪Problem: a lot of time-consuming metric distance computations
▪Approximation technique proposal
▪Sketch-based filter
▪Hamming distances
?

Continuous kNN Join
Time window
Past
Future
Users‘ preferences
(query objects)
3NN join
▪Task: efficiently evaluate kNN join at any time
▪Challenge: efficient join computation having a lot of query objects and a lot of stream objects in
the time window
▪General approach: update the join result continuously
▪Problem: a lot of time-consuming metric distance computations
▪Approximation technique proposal
▪Sketch-based filter
▪Hamming distances

Experiments
▪Dataset: Caffe descriptors (4,096 dimensional vectors)
▪1 step: process a new stream object, delete the oldest stream object
▪Each experiment: 1,000 steps
▪10-NN join
▪200,000 stream objects in the time window
▪Sketch-based approach vs. no sketches:
▪sketches nearly 13 times faster, almost precise results (less than 0.5% missed objects)

What Is Mocap
•Digital representation is depicted by series of coordinates of body joints in space-time.
•Complex multi-dimensional spatio-temporal data (3D space, 31 joints, 120 frames per second).
•Visualized by simplified human skeleton (stick figure), coordinates of joints stored as float
numbers.
•1 minute of such motion data ≈  669,600 float numbers.
http://cg.cs.uni-bonn.de/aigaion2root/attachments/keyframeSynthesis_web.jpg

Our Approach – Motion Images
 Every single-frame joint configuration is normalized (by centering and rotating), then transformed
into a RGB stripe image while fully preserving skeleton configuration.

We focus on body joints, not volumetric model nor skeleton bones
Why normalziation is important

Motion Images + Caffe

 1) Effective transformation
from (dynamic) motion capture data into (static) images.
 2) Extract fixed-size feature vector
using content-based image descriptors.
 3) Index for fast and scalable search
1)
Motion
Motion Image
(hop-one-leg)
2)
<0, 0, 4.56, 0, 7.88, …>
Nearest neighbors
3)
d = 13.17
d = 10.1
d = 14.2
d = 1.1
4096-dim
Caffe vector

 Demos
 Subsequence search demo
 http://disa.fi.muni.cz/mocap-demo/
 Action recognition demo
 http://disa.fi.muni.cz/mocap-action-recognition/
 Gait recognition demo
 http://disa.fi.muni.cz/mmpi

Speed Climbing
Olympic sport in Tokyo 2021
Speed climbing in numbers
•15 m vertical wall
•5° overhang
•20 big hand holds, 11 small foot holds (standardized layout)
•5.48 s world record (≈ 10 km/h)
•
Speed climbing is suitable for computer-aided motion analysis based on 2D skeleton sequences

Application: Speed Climbing Analysis
Performance comparison of two climbers (or with world record)

Green chart – time advantage (who is winning)
Blue chart – speed difference (who is faster)
Color clusters – biggest delay points where hands and feet stayed for longer time

Similarity Search in Protein Chains
 Each protein consists of 1 or more subparts – protein chains
 Approx. 500,000 chains are known – Protein Data Bank (PDB)

 3D models of protein chains are used
to define their pairwise similarity
◦Similarity evaluation time strongly depends
on the size of compared chains
◦
◦
Model of a protein chain: balls ≈ atoms, sticks ≈ bonds between atoms. Green ribbon ≈
simplification of the main atoms

 International conference series (http://sisap.org/)
2008
Cancun
Mexico
VÃ½sledek obrÃ¡zku pro cancun
2012
Toronto
Canada
2016
Tokyo
Japan
2018
Lima
Peru
2009
Prague
Czechia
2010
Istanbul
Turkey
2011
Lipari
Italy
2013
A Coruña
Spain
2017
Munich
Germany
2015
Glasgow
UK
2014
Los Cabos
Mexico
VÃ½sledek obrÃ¡zku pro prague VÃ½sledek obrÃ¡zku pro istanbul VÃ½sledek obrÃ¡zku pro lipari
SouvisejÃcÃ obrÃ¡zek VÃ½sledek obrÃ¡zku pro la coruna VÃ½sledek obrÃ¡zku pro los cabos VÃ½sledek
obrÃ¡zku pro glasgow VÃ½sledek obrÃ¡zku pro tokyo fudzi VÃ½sledek obrÃ¡zku pro munich VÃ½sledek
obrÃ¡zku pro lima peru
2019
Newark NJ
USA
by _skynet (CC BY-SA 3.0)
SISAP (Similarity Search and Applications) International Conferences
2022
Bologna
Italy
Virtual event - Free technology icons Virtual event - Free technology icons

DISA at Masaryk University

  Laboratory of
Data Intensive Systems and Applications


  http://disa.fi.muni.cz

Our vision – future research challenges
 Challenge No.1 (adaptability):
 Respecting continuously changing distance metric – searched collection size as well as up to date
collection of known samples – continuously adapt the search indexing mechanisms.
 Challenge No. 2 (explainability):
 Respecting an application domain – e.g. human skeleton data – provide explanation tools that might
be requested on demand. Similarity cracks the code of explainable AI.