Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz Image Retrieval through Codebooks Marián Labuda Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Principles ● System Architecture ● Visual Vocabularies ● Experimental results ● Conclusion Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Principles - local feature extraction from a set of images - conversion of features to visual words (quantization) - using text retrieval technology Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Basic operations 1) Indexing - Feature extraction - list of local image features - features as descriptors (high dimensional vectors) - Quantization – images as text documents - quantization through visual vocabulary - simple text string instead of high dimensional vector - Storage – visual words in inverted index 2) Querying - Feature extraction – a list of local image features for a given image - Quantization – the query image as a list of keywords - Searching – in an inverted index using text retrieval engine - answer ranking according to similarity - Post-processing – e.g. spatial verification Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz Feature extraction - image content described by image features - image features - global features (e.g. color histogram) - local features (e.g. SIFT, SURF, MSER, ...) - pros. & cons. of local image features - extraction of features creates descriptors - computation complexity of high dimensional descriptors Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz SIFTs Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz Color Histogram Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● System Architecture - MESSIF framework – data abstraction - feature extraction - Web UI - Lucene Core – indexing and searching - My implementation - visual vocabulary - similarity modification - post-processing Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Lucene Core - full-text search engine - usage of inverted files - high performance - supports various types of queries - ranked searching - using Lucene is provided through class LuceneAlgorithm - combines Lucene Core library and MESSIF framework Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz Visual Vocabulary - provide quantization of local image descriptors - visual vocabulary as a bottleneck of Image Retrieval System - images as sets of quantized features - some words can occur too often, others very few - later we can filter them as stop words (analogy to prepositions, ...) Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● My implementation - Visual vocabulary - abstract class AbstractVisualVocabulary - simple implementation and integration of a new visual vocabulary - implemented visual vocabularies: - K-means - MDPV (metric distance permutation vocabulary) - efficiency of the given visual vocabulary Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz Visual vocabularies - hierarchical k-means - hierarchical k-means provides a faster way to quantize descriptors than flat - a visual word as a path from the root to a leaf node Demo application - deployed at http://mufin.fi.muni.cz/subimages-lucene/random - oxford building dataset (~5000 images) - 6 hierarchical k-means - 10 descriptors in node Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ●Visual vocabularies - MDPV - idea of metric distance permutation vocabulary – based on a small number of pivots and recursivelly defines voronoi cells - provides finer granularity than k-means - faster and requires less space than k-means - visual words as a sequence of more pivots (5 or 10 are used at most) Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● My implementation - Similarity modification - text based retrieval systems often use tf-idf scoring - tf - term frequency – „How many times does the term occur in the document (in one image)?“ - idf - inverse document frequency – „How many documents (images) contain the given term in whole collection?“ - Lucene provides an easy way to implement own similarity - definition of similarity by scoring - score computation: - experimentally determined the best similarity on the ukbench dataset scoreq ,d =coord q ,d ∗tf t∈d ∗idf t ∗t.getBoost  Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● My implementation – Post-processing - RANSAC - RANdom SAmple Consensus - geometric verification - not enough improvement on the given vocabularies – fine-grained words Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Experimental results - Ukbench dataset contains 5 (almost) different datasets - CD – CD covers - moving – moving vehicles - CD+moving – combination of CD and moving datasets - flip – some flipped versions of normal dataset - normal – test set - locators of similar images example: sift100000 sift100001 sift100002 sift100003 Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Experimental results (continue) - similarity tests based on the matrix of different settings: - term frequency weights - inverse document frequency weights - overlap weights - compared to ukbench results – visual words provided by authors - comparison based on relationship to flat ukbench results - slightly better results in effectiveness – better retrieval - slightly worse results in speed - „best“ similarity: N – total count of documents overlap – terms of query in document - documents count containing term maxOverlap – count of query terms tf =1 idf =ln N N i  2 coord = overlap maxOverlap  3 Ni Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Experimental results (continue 2) - how many of the ground-truth images occured in top 4: - each dataset consists of 10200 images - every image has 3 other ground-truth images - Customized similarity is same as My Similarity except the tf - customized similarity count with term frequency as instead of 1 Dataset My Similarity Customized Ukbench CD 2,9583 2,9344 2,8956 movie 2,8912 2,8762 2,8285 CD & movie 2,9528 2,9326 2,8844 flip 2,9609 2,9548 3,0144 test 3,1414 3,1498 3,1664 Average value 2,9809 2,9696 2,9579  f Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Experimental results (continue 3) - unfortunately I was not able to reproduce their results with any similarity - stop words did not provide more precise results - term frequency can be fully omited - index time is up to 2 minutes - searching of 10200 queries on database consisting of 10200 documents takes from 6 to 10 minutes (39 – 60 ms per image) - experiments on Lenovo T430s – 8 GB RAM - Intel Core i7 3520M (2,9 GHz, 2 cores) Image Retrieval System 25.11.2013 Marián Labuda 359322@mail.muni.cz ● Conclusion - image retrieval system based on visual vocabularies and inverted files are as strong as the vocabulary - term frequency can be fully omited - usage of stop words depends on vocabulary - inverted index provides fast and scalable solution