^ Vq* Faculty of Informatics Masaryk University Czech Republic Developing Similarity Search Technology Habilitation Thesis (Collection of Articles) Vlastislav Dohnal February 2011 Abstract Similarity searching has become more and more popular, which was stimulated by the growth of diverse data archives available on-line that offer search services to users, and by the increasing complexity of data that must be searched. This issue has also been recognized by major Internet search engines, exemplified by Google, that recently enriched their image search services by allowing users to search for images by similarity. They usually apply the following procedure. Firstly, a candidate set of images is obtained by a regular text search in images' file names and associated textual tags. Then this set is reordered by images' content, expressed as color histograms, for example. Finally, this result is presented to the user. In this thesis, we focus on similarity searching - content-based retrieval. In this area, data items are retrieved by their content rather than by textual information associated with them. For example, images are searched by comparing their color histograms to the histogram posed as a query by a user. The principle of ranking search results may also be applied to further increase the user's satisfaction with the search results. The problem of similarity searching, as is studied in this thesis, uses metric space as a convenient data model. The author's contributions range from centralized index structures via distributed ones up to a new indexing paradigm - self-organizing systems. These results represent the efficiency issue of similarity searching. On the other hand, the effectiveness of similarity searching can be expressed as efforts to model computerized similarity as closest as possible to the human perception of similarity. This problem is also tackled in this thesis by introducing a new query type. This thesis is conceived as a collection of articles. The collection contains 24 contributions published as either a monograph, chapters in monographs, or journal and conference papers. The author's contribution is mainly in stating research issues and supervising students during solving the issues, but also by running some of experimental trials, collecting necessary input data, and writing and editing significant parts of articles. Concrete contributions to articles published are specified at the corresponding articles listed in sections titled "Articles in Collection". In general, the author's contribution is quantified as the ratio 1/x, where x is the number of authors of that particular article. iii Abstrakt Současný růst popularity podobnostního hledání je stimulován jak zvyšující se di-verzitou dat zpřístupňovaných různými archivy, tak i rostoucí složitostí dat v nich obsažených. Tato tendence byla rozpoznána i vyhledávači, např. společností Google, která nedávno obohatila své vyhledávání v obrázcích o službu podobnostního hledání podle obsahu. Tento vyhledávač ale podobné obrázky vyhledává stále tradičním způsobem, tj. podle textu obsaženého v názvu souboru nebo popisků asociovaných s obrázkem. Přidaná služba pak přeuspořádá výsledky textového hledání podle podobnosti, např. podle barevného histogramu, a výsledek vrátí uživateli. V této práci se soustřeďujeme na podobnostní hledání podle obsahu (angl. „content-based re-trieval"). Takový druh hledání identifikuje data podle jejich faktického obsahu než podle textových informací, které jsou k nim přidruženy. Příkladem může být vyhledání všech obrázků, jejichž barevný histogram se podobá histogramu, který byl uživatelem předložen jako dotaz. Techniky pro přeuspořádání výsledků (angl. „rank-ing") lze také aplikovat a sice z důvodu zvýšení spokojenosti uživatele s předloženými výsledky hledání. V této práci se zabýváme podobnostním hledáním, které jako vhodný datový model používá metrický prostor. Autorův přínos lze zařadit nejen mezi centralizované a distribuované datové struktury ale i do nové oblasti, ve které se aplikují principy samoorganizace. Zmíněné přínosy jsou soustředěny na problematiku výkonnosti, která je měřena např. jako čas zpracování dotazu. Podobnostní hledání má ale i druhou stránku a to problematiku definice podobnosti. V ideální situaci předpis definující podobnost pro účely vyhledávacího systému přesně odpovídá podobnosti chápanou člověkem. Docílení tohoto stavu, bohužel, bývá často velmi obtížné. Uvedená problematika je rovněž studována v této práci, i když ne tak obsáhle jako stránka výkonnosti. Tato habilitační práce je koncipována jako soubor uveřejněných vědeckých prací (§72 odst. 3 písmena b zákona o vysokých školách). Soubor je tvořen dvacetičtyřmi publikacemi ve formě monografie, kapitol v monografiích, časopiseckých článků nebo konferenčních příspěvků. Autorův přínos lze shrnout jako identifikace výzkumných témat a vedení studentů při realizaci výzkumných projektů, ale také jako formulování myšlenek a analýzy výsledků experimentálních měření do podoby textových zpráv. Konkrétní přínosy autora jsou uvedeny u citací prací, které tvoří tento soubor. Tyto citace jsou vždy souhrnně uvedeny v sekcích nazvaných „Sbírka článků" (angl. „Ar-ticles in Collection"). Obecně je autorův přínos kvantifikován jako podíl l/x, kde x vyjadřuje počet autorů konkrétního příspěvku. v Acknowledgments Firstly, I would like to thank Pavel Zezula, my former supervisor, for his guidance and fruitful ideas. I also thank my colleagues, former as well as current ones, namely Michal Batko, Stanislav Bartoň and David Novák. I thank all the students that participated in the research area and who did the real hard work, namely Jan Sedmidubský. Secondly, I appreciate the cooperation with partners from other universities who co-authored some of the papers, mainly the work with Giuseppe Amato from ISTI-CNR, Pisa, Italy. Lastly, my thanks go to my wife, children and parents for their endless patience, and to my best friends, namely Jiří Barnat for helping me with the thesis formatting. In Brno, February 2011 Vlastislav Dohnal Vil Contents Preface xi I Commentary 1 1 Introduction 3 1.1 Challenges.................................... 4 2 Similarity Searching in a Nutshell 5 2.1 Metric Space................................... 5 2.2 Distance Measures ............................... 6 2.3 Similarity Queries................................ 7 2.4 Articles in Collection.............................. 8 3 Centralized Index Structures 9 3.1 Data Partitioning................................ 9 3.2 Pre-computed Distances............................ 10 3.3 Combined Approaches............................. 11 3.4 Articles in Collection.............................. 12 4 Distributed Index Structures 13 4.1 Centralized Control............................... 13 4.2 Decentralized Control ............................. 13 4.2.1 Splitting Nodes............................. 14 4.3 Demonstration Applications.......................... 14 4.4 Articles in Collection.............................. 14 5 Self-organizing Search Systems 17 5.1 Metric Social Network............................. 17 5.2 Other Systems.................................. 18 5.3 Articles in Collection.............................. 19 6 Querying Effectiveness 21 6.1 Result Quality.................................. 21 6.1.1 Eliminating Duplicates......................... 22 6.2 Query Containment............................... 22 6.2.1 Proximity-based Order-respecting Intersection........... 23 ix X CONTENTS 6.3 Articles in Collection.............................. 23 7 Conclusions 25 Bibliography 27 II Collection of Articles 39 9 Books 41 10 Book Chapters 43 11 Journals 45 12 Conference Papers 47 Preface This thesis is conceived as a collection of articles. The collection contains 24 contributions published as either a monograph, chapters in monographs, or journal and conference papers. The author's contribution is mainly in stating research issues and supervising students during solving the issues, but also by running some of experimental trials, collecting necessary input data, and writing and editing significant parts of articles. The focus of the thesis is on similarity searching from the perspective of efficiency but also from the effectiveness point of view. It summarizes recent advances in this area and witnesses systems for similarity searching maturing from small centralized solutions via distributed structures to large systems exhibiting self-organizing properties. The contribution in terms of effectiveness is in enriching the variety of similarity queries. This all together forms Part I. For clarity, the author's contribution is emphasized through out this part by adding a check mark (/) as a margin note, as is depicted next to this sentence. Part II contains full versions of articles published. They are categorized to monographs, monograph chapters, journal and conference papers. xi Parti Commentary Chapter 1 Introduction In the Information Society, information holds the master key to economic influence and success. But the usefulness of information depends critically upon its quality and the speed at which it can be transferred. The scale of the problem is emphasized by the recent growth of digital libraries, data warehouses and other Internet resources that have arisen as a direct consequence of latest advances in computing, communication and storage. These repositories organize data of various domains such as multimedia, scientific observations, molecular biology, computer-aided design or even marketing and purchasing analysis. Traditional retrieval techniques, typically based upon sorting routines and hash tables, are not appropriate for newly-emerging data domains listed above. More flexible methods must be found instead and they have to take into account the needs of particular users and application domains. The problem of searching can also be observed from the data volume point of view. In particular, almost everything we see, hear, read, write or measure will soon be available to a computerized information system. This can be currently proved by citing the Flickr blog 1 2 which reported five billionth photo was uploaded in September 2010. They also stated that 3,000 new photos are uploaded each minute. Ordinary retrieval techniques are inadequate in many of these newer data domains because linear sorting is simply impossible. For illustration, consider a collection of bit patterns compared using the Hamming distance, i.e., the number of bits by which a given pair of patterns differs. There is no way to sort all the patterns linearly so that, selecting any arbitrary member, the objects similar to it will be closer in the ordering than the others. The same applies to the spectrum of colors. Obviously, we can sort colors according to their similarity with respect to a specific hue, for example pink. But we cannot sort the set of all colors in such a way that, for each hue, its immediate neighbor is the hue most similar to it. This is what has given rise to a novel indexing paradigm based upon distance. From a formal standpoint, the search problem is modelled in metric space. The collection of objects to be searched forms a subset of the metric space domain, and the distance measure applied to pairs of objects is a metric distance function. This approach significantly extends the scope of traditional search approaches and supports execution of similarity queries. From a technical point of view, the search is not done flickr - Photo Sharing, http : //www . flickr . com/ 2http://blog.flickr.net/en/2 010/0 9/19/50 000 00 000/ 3 4 CHAPTER 1. INTRODUCTION on original data objects, such as images or video, directly. But some characteristics (features, descriptors) are extracted from them instead. Then, a similarity search system organizes such characteristics and executes queries over them. It implies that the same characteristics must be obtained from queries before processing. 1.1 Challenges In this thesis, we seek the following challenges. • Dissemination - by publishing articles about similarity searching at outstanding forums, we spread the knowledge about and technology used in similarity searching using metric spaces. Here, we would like to emphasis the first monograph studying the issue of metric similarity searching thoroughly and completely from basics to advances in existing solutions. • Efficiency of similarity searching is an important issue to analyze due to enormous growth of data in volume. In this direction, we shift from the centralized paradigm to distributed one or even beyond to structures exhibiting self-organizing properties. • Effectiveness of similarity searching is the other, orthogonal but closely related, issue. This term refers to the quality of similarity used, i.e., how big a discrepancy between human perception of similarity and the similarity modelled by a metric function is. We contributed in this area by proposing a new query type and an algorithm for solving the query inclusion problem. The topics listed are presented in a concise way throughout this part. In Chapter 2, necessary definitions are introduced. Existing solutions for single computers are surveyed in Chapter 3. The efficiency problem is studied within Chapter 4 and Chapter 5. In Chapter 6, the third distinct contribution is sketched out. The thesis concludes in Chapter 7. Part II consists of articles referenced from Part I as author's contributions. Chapter 2 Similarity Searching in a Nutshell In contrast to traditional databases made up of simple attribute data, contemporary data is bulkier and more complex in nature. To deal with the increased bulk, data reduction techniques are employed as in [5]. These approaches typically result in high-dimensional vectors or other objects for which nothing beyond pairwise distances can be measured. Such data are sometimes designated distance-only data. A similar situation can occur with multimedia data. Here, the standard approach is to search not at the level of the actual multimedia objects, but rather using characteristic features extracted from these objects, e.g. color histogram obtained from an image. In such environments, an exact match has little meaning, and proximity concepts {similarity, dissimilarity) are typically much more fruitful for searching. Recent books or surveys that summarize the problem of similarity searching are available in [97,117, 58, 31]. 2.1 Metric Space A useful abstraction for similarity is provided by the mathematical notion of metric space [69]. We consider the problem of organizing and searching large data sets from the perspective of generic or arbitrary metric spaces, sometimes conveniently labeled distance spaces. In general, the search problem can be described as follows: Problem 2.1.1 Let V be a domain, d a distance measure on V, and (T>,d) a metric space. Given a set X C V of n elements, preprocess or structure the data so that proximity queries are answered efficiently. From a practical point of view, X can be seen as a file (a data set or a collection) of objects that takes values from the domain V, with d as the proximity measure, i.e., the distance function defined for an arbitrary pair of objects from V. Though several types of similarity queries exist and others are expected to appear in the future, the basic types are known as the similarity range and the nearest neighbor(s) queries. In a distance space, the only possible operation on data objects is the computation of a distance function on pairs of objects which satisfies the triangle inequality. In contrast, objects in a coordinate space - coordinate space being a special case of metric space - can be seen as vectors. Such spaces usually satisfy some additional properties that can be exploited in a storage and index structure design. Naturally, the distance between vectors can be computed, but each vector can also be uniquely located in 5 6 CHAPTER 2. SIMILARITY SEARCHING IN A NUTSHELL coordinate space. Further, vector representation allows us to perform operations like vector addition and subtraction. Thus, new vectors can be constructed from prior vectors. For more information, see [55, 16] for surveys of techniques that exploit the properties of coordinate space. However, treating data collections as metric objects brings a great advantage in generality, because many data classes and information-seeking strategies conform to the metric view. Accordingly, a single metric indexing technique can be applied to many specific search problems quite different in nature. In this way, the important extensibility property of indexing structures is automatically satisfied. An indexing scheme that allows various forms of queries, or which can be modified to provide additional functionality, is of more value than an indexing scheme otherwise equivalent in power or even better in certain respects, but which cannot be extended. 2.2 Distance Measures In the following, we present some examples of distance functions (metrics) used in practice on various types of data. Distance functions are often tailored to specific applications or a class of possible applications. In practice, distance functions are specified by domain experts, however, no distance function restricts the variety of queries that can be asked with this metric. Each distance function has to satisfy the following postulates: Vx, i/gP, d(x, y) > 0 non-negativity Vx, i/gP, d(x, y) = d(y, x) symmetry, Vx, y G V,x = y <=)> d(x, y) = 0 identity, Vx, y,z £ V, d(x, z) < d(x, y) + d(y, z) triangle inequality. The Minkowski distance functions form a whole family of metric functions, designated as the Lp metrics, because the individual cases depend on the numeric parameter p. These functions are defined on ra-dimensional vectors of real numbers as: where the L\ metric is known as the Manhattan distance (or the City-Block distance), the L2 distance denotes the well-known Euclidean distance, and the = max"=1 |xj — y,i | is called the maximum distance, the infinite distance or the chessboard distance. The Lp metrics find use in a number of cases where numerical vectors have independent coordinates, e.g., in measurements of scientific experiments, environmental observations, or the study of different aspects of the business process. Several applications using vector data have individual components, i.e., feature dimensions, correlated, so a kind of cross-talk exists between individual dimensions. Consider, for example, color histograms of images, where each dimension represents a specific color. To compute a distance, the red component, for example, must be 2.3. SIMILARITY QUERIES 7 compared not only with the dimension representing the red color, but also with the pink and orange, because these colors are similar. The Euclidean distance L2 does not reflect any correlation of features of color histograms. A distance model that has been successfully applied to image databases in [53], and that has the power to model dependencies between different components of features, is provided by the quadratic form distance functions in [57, 103]. In this approach, the distance measure of two re-dimensional vectors is based on an re x re positive semi-definite matrix M = [m^j], where the weights rriij denote how strong the connection between two components i and j of vectors x and y is, respectively. The following expression represents a generalized quadratic distance measure du, where the superscript T denotes vector transposition: dM(x, y) = \J{x — y)T ■ M ■ {x — y) . The quadratic form distance measure may be computationally expensive, depending upon the dimensionality of the vectors. Color image histograms are typically high-dimensional vectors consisting of 64 or 256 distinct colors (vector dimensions). The closeness of sequences of symbols (strings) can be effectively measured by the edit distance, also called the Levenshtein distance [71]. The distance between two strings x = x\ • • • xn and y = y\ • • • ym is defined as the minimum number of atomic edit operations (insert, delete, and replace) needed to transform string x into string y. The generalized edit distance function assigns weights (positive real numbers) to individual atomic operations. Hence, the distance between strings x and y is the minimum value of the sum of weighted atomic operations needed to transform x into y. If the weights of insert and delete operations differ, the edit distance is not symmetric and therefore not a metric function. However, the weight of the replace operation can differ. An excellent survey on string matching can be found in [83]. Let us now focus on a distance function to sets. Assuming two sets A and B, Jaccard's coefficient is defined as d(A,B) = l UnBl AUB This distance function is simply based on the ratio between the cardinalities of intersection and union of the compared sets. An application of this metric to vector data is called the Tanimoto similarity measure [70], the distance version of which can be defined as: drs{x, y) = 1 - X f-, llxll + llyll ~ x' y where x • y is the scalar product of x and y, and \\x\\ is the Euclidean norm of x. 2.3 Similarity Queries A similarity query is defined explicitly or implicitly by a query object q and a constraint on the form and extent of proximity required, typically expressed as a distance. The response to a query returns all objects which satisfy the selection conditions. The most common type of similarity query is the range query R(q, r). The query is specified by a query object q £ V and a query radius r as the distance constraint. The 8 CHAPTER 2. SIMILARITY SEARCHING IN A NUTSHELL query retrieves all objects found within distance r of q, formally: R{q,r) ={o£X,d{o,q) A (d(x, q) < d(y, q) V 3w £ R : d(y, w) < ), where 0 is a user-defined separation distance. The objects within the distance are considered as duplicates. Note that for 0 = 0 we get the classic kNN query. 6.2 Query Containment There are many situations in which the searching for objects that are globally similar to the given query is not sufficient. This is especially true for image database where a user may require to search for images containing an object exposed in the query image. Thus, the retrieval system should solve the query containment problem. In the following, we focus on query containment issue in case of images. There are two main streams in research. Firstly, global image descriptors, such as color histogram, are used. Secondly, local image descriptors, such as SIFT or SURF, are applied. The local descriptors characterizes small parts of an image, thus many descriptors (even thousands) are usually extracted from an image, which is a major disadvantage of local descriptors when comparing to global ones, where only one descriptor is obtained. In case of global descriptors, a solution proposed in [96] segments all database images as well as query images into chunks of pre-defined size and a global descriptor is extracted from each chunk. A searching procedure then finds the correspondence between the database images and the query image. For good retrieval results of sub-images segments of an image must overlap significantly, so this technique must be tuned properly. For this reason, local descriptors are favorable. 6.3. ARTICLES IN COLLECTION 23 In [68], the authors present a solution to storage implementation of the LSH-coded descriptors that allows searching for sub-images in linear time. However, this implementation does not handle the spatial position of features. Another retrieved information reduction [95] uses the properties of PCA-SIFT descriptors [67] directly. In particular, their hierarchical ordering and bit representation of each feature are used. This leads to the most-significant bit index files, which are memory-oriented structures storing bit prefixes of PCA-SIFT descriptors. Neither this technique indexes the spatial information. In [33], the geometric min-hash (GmH) algorithm is proposed. It is based on the original min-hash [34], but it incorporates the spatial context of features. From the searching point of view, it identifies regions in an image in which the identical or almost-identical groups of features with respect to the query image occur. However, only small spatial surrounding is considered. The method presented in [64] finds small logos in a natural image collection. SIFT features are reduced using the multi-probe locality sensitive hashing [75]. To determine the geometrically close features the RANSAC algorithm [54] is applied. 6.2.1 Proximity-based Order-respecting Intersection Our approach to query containment in image database is based on a proximity-based order-respecting intersection, so-called e-intersection. It is based on local image descriptors. Before querying the most important local descriptors are extracted from a query image. Then, a regular range query is executed for each of these descriptors. The queries with radius e are evaluated on a database containing all local descriptors extracted from indexed images. After grouping the descriptors retrieved by original image identification, spatial distribution of descriptors is checked by projecting the descriptors to x and y axes independently. A special ORD function is used to rank the result. The formal definition of e-intersection is available in [59] whereas a prototype implementation of this approach is presented in [60]. 6.3 Articles in Collection The following list of articles consists of one paper defining a new query type and two papers tackling the query containment problem - one of them is a demonstration paper. • Tomáš Skopal, Vlastislav Dohnal, Michal Batko, and Pavel Zezula. Distinct nearest neighbors queries for similarity search in very large multimedia databases. In 11th ACM International Workshop on Web Information and Data Management (WIDM 2009), pages 11-14, New York, USA, 2009. ACM. Author's contribution: 1/4, prototyping all algorithms, experiment planning and evaluation. • Tomáš Homola, Vlastislav Dohnal, and Pavel Zezula. Proximity-based order-respecting intersection for searching in image databases. In 8th International Workshop on Adaptive Multimedia Retrieval, AMR'2010, Linz, 2010. CHAPTER 6. QUERYING EFFECTIVENESS Author's contribution: 1/3, writing the article, editing. • Tomáš Homola, Vlastislav Dohnal, and Pavel Zezula. Sub-image searching through intersection of local descriptors. In 3rd International Conference on Similarity Search and Applications (SISAP 2010), pages 127-128, New York, 2010. ACM Press. Author's contribution: 1/3, partly prototyping on indexing structures, editing the article. Chapter 7 Conclusions In this thesis, we surveyed the author's contributions to the area of similarity searching when the metric space is applied as a convenient data model. The main contribution is a pioneering book on similarity searching which is currently cited more than 180 times. The other contributions consist of advances in distributed structured index networks and self-organizing solutions, and finding solutions to new query types. The unifying idea of all contributions of this thesis is bringing similarity search toys to mature content-based information retrieval systems. This collection of articles consists of 24 articles including one monograph and five monograph chapters. 25 Bibliography [1] Giuseppe Amato and Pasquale Savino. Approximate similarity search in metric spaces using inverted files. In Proceedings of the 3rd International Conference on Scalable Information Systems (InfoScale 2008), pages 1-10, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). [2] Ricardo A. Baeza-Yates. Searching: an algorithmic tour. In Allen Kent and James G. Williams, editors, Encyclopedia of Computer Science and Technology, pages 331-359. Marcel Dekker, Inc., 1997. [3] Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. Proximity matching using fixed-queries trees. In Maxime Crochemore and Dan Gusfield, editors, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM 1994), Asilomar, California, USA, June 5-8,1994, Lecture Notes in Computer Science, pages 198-212. Springer, Berlin, 1994. [4] Tao Ban. Using genetic algorithm to balance the d-index algorithm for metric search. In Masumi Ishikawa, Kenji Doya, Hiroyuki Miyamoto, and Takeshi Yamakawa, editors, Neural Information Processing, volume 4985 of Lecture Notes in Computer Science, pages 264-273. Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-69162-4.28. [5] Daniel Barbara, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, and Kenneth C. Sevcik. The new jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3-45, 1997. [6] Marcelo Barroso, Nora Reyes, and Rodrigo Paredes. Enlarging nodes to improve dynamic spatial approximation trees. In Proceedings of the Third International Conference on Similarity Search and Applications (SISAP 2010), pages 41-48, New York, NY, USA, 2010. ACM. [7] Stanislav Barton, Vlastislav Dohnal, Jan Sedmidubsky, and Pavel Zezula. Gauging the evolution of metric social network. In 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007) held at 33rd International Conference on Very Large Data Bases (VLDB 2007), pages 1-12, Vienna, 2007. VLDB Endowment. 27 28 BIBLIOGRAPHY [8] Stanislav Bartoň, Vlastislav Dohnal, Jan Sedmidubský, and Pavel Zezula. Building self-organized image retrieval network. In Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval (LSDS-IR 2008), pages 51-58, USA, 2008. ACM. [9] Stanislav Bartoň, Vlastislav Dohnal, Jan Sedmidubský, and Pavel Zezula. Computational Social Network Analysis: Trends, Tools and Research Advances, chapter Towards Self-organizing Search Systems, pages 49-80. Computer Communications and Networks. Springer, New York, NY, USA, 2010. [10] Michal Batko. Scalable and Distributed Similarity Search. PhD thesis, Masaryk University, May 2006. [11] Michal Batko, Vlastislav Dohnal, David Novák, and Jan Sedmidubský. Mufin: A multi-feature indexing network. In Proceedings of the Second International Workshop on Similarity Search and Applications (SISAP 2009), pages 158-159, Washington, DC, USA, 2009. IEEE Computer Society. [12] Michal Batko, Vlastislav Dohnal, and Pavel Zezula. M-grid: Similarity searching in grids. In Proceedings of International Workshop on Information Retrieval in Peer-to-Peer Networks, ACM CIKM 2006, pages 17-24, Arlington, 2006. ACM Press. [13] Michal Batko, David Novák, Fabrizio Falchi, and Pavel Zezula. On scalability of the similarity search in the world of peers. In Proceedings of First International Conference on Scalable Information Systems (INFOSCALE 2006), Hong Kong, May 30-June 1, 2006, pages 1-12, New York, NY, USA, 2006. ACM Press. [14] Michal Batko, David Novák, Fabrizio Falchi, and Pavel Zezula. Scalability comparison of peer-to-peer similarity search structures. Future Generation Computer Systems, 24(8):834-848, October 2008. [15] Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and David L. Wheeler. Genbank: update. Nucleic Acids Research, 32:Database Issue D23-D26, 2004. [16] ERRORinAuthors Bib298. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Sum., 33(3):322-373,2001. [17] Tolga Bozkaya and Meral Z. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Joan Peckham, editor, Proceedings of the ACM International Conference on Management of Data (SIGMOD 1997), Tucson, Arizona, USA, May 13-15,1997, pages 357-368. ACM Press, 1997. [18] Tolga Bozkaya and Meral Z. Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Transactions on Database Systems (TODS 1999), 24(3):361-404,1999. [19] Sergey Brin. Near neighbor search in large metric spaces. In Umeshwar Dayal, Peter M. D. Gray, and Shojiro Nishio, editors, Proceedings of the 21th International BIBLIOGRAPHY 29 Conference on Very Large Data Bases (VLDB 1995), Zurich, Switzerland, September 11-15, 1995, pages 574-584. Morgan Kaufmann, 1995. [20] Nicolas Bruno and Hui (Wendy) Wang. The threshold algorithm: From middleware systems to the relational engine. IEEE Transactions on Knowledge and Data Engineering, 19(4)523-537, 2007. [21] Petra Budikova, Michal Batko, and Pavel Zezula. Improving the image retrieval system by ranking. In Proceedings of the Third International Conference on Similarity Search and Applications (SISAP 2010), pages 123-124, New York, NY, USA, 2010. ACM. [22] Edouard Bugnion, Shi Fhei, Thomas Roos, Peter Widmayer, and Felizitas Widmer. A spatial index for approximate multiple string matching. In Ricardo A. Baeza-Yates and N. Ziviani, editors, Proceedings of the 1st South American Workshop on String Processing (WSP 1993), Belo Horizonte, Brazil, September 13-15,1993, pages 43-53,1993. [23] Walter A. Burkhard and Robert M. Keller. Some approaches to best-match file searching. Communications of the ACM (CACM 1973), 16(4):230-236,1973. [24] Benjamin Bustos and Tomáš Skopal. Dynamic similarity search in multi-metric spaces. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval (MIR 2006), pages 137-146, New York, NY, USA, 2006. ACM. [25] Edgar Chávez, Karina Figueroa, and Gonzalo Navarro. Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell., 30(9):1647-1658,2008. [26] Edgar Chavez, Jose L. Marroquin, and Ricardo A. Baeza-Yates. Spaghettis: An array based algorithm for similarity queries in metric spaces. In Proceedings of the 6th International Symposium on String Processing and Information Retrieval & International Workshop on Groupware (SPIRE/CRIWG 1999), Cancun, Mexico, September 21-24,1999, pages 38-46. IEEE Computer Society, 1999. [27] Edgar Chavez, Jose L. Marroquin, and Gonzalo Navarro. Overcoming the curse of dimensionality. In Procedings of the European Workshop on Content-Based Multimedia Indexing (CBMI1999), Toulouse, France, October 25-27, 1999, pages 57-64, 1999. [28] Edgar Chavez, Jose L. Marroquin, and Gonzalo Navarro. Fixed Queries Array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications, 14(2):113-135,2001. [29] Edgar Chavez and Gonzalo Navarro. An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th International Symposium on String Processing and Information Retrieval (SPIRE), pages 75-86. IEEE CS Press, 2000. [30] Edgar Chavez and Gonzalo Navarro. A compact space decomposition for effective metric indexing. Pattern Recogn. Lett., 26(9):1363-1376, 2005. 30 BIBLIOGRAPHY [31] Edgar Chavez, Gonzalo Navarro, Ricardo Baeza-Yates, and Jose Luis Mar-roquin. Searching in metric spaces. ACM Comput. Surv., 33(3):273-321, 2001. [32] Jan Chomicki. Querying with intrinsic preferences. In Christian S. Jensen, Keith G. Jeffery Jaroslav Pokorný, Simonas Saltenis, Elisa Bertino, Klemens Böhm, and Matthias Jarke, editors, Proceedings of the 8th International Conference on Extending Database Technology (EDBT 2002), Lecture Notes in Computer Science, pages 34-51. Springer, 2002. [33] Ondřej Chum, M. Perdoch, and J. Matas. Geometric min-hashing: Finding a (thick) needle in a haystack. In Proceedings of Computer Vision and Pattern Recognition (CVPR 2009), pages 17-24, Los Alamitos, CA, USA, 2009. IEEE Computer Society. [34] Ondřej Chum, James Philbin, Michael Isard, and Andrew Zisserman. Scalable near identical image and shot detection. In Proceedings of the 6th ACM international conference on Image and video retrieval (CIVR 2007), pages 549-556, New York, NY, USA, 2007. ACM. [35] Paolo Ciaccia and Marco Patella. Bulk loading the M-tree. In Proceedings of the 9th Australasian Database Conference (ADC 1998), Perth, Australia, February 2-3, 1998, Australian Computer Science Communications, pages 15-26. Springer, 1998. [36] Paolo Ciaccia and Marco Patella. Searching in metric spaces with user-defined and approximate distances. ACM Transactions on Database Systems (TODS 2002), 27(4):398-437, 2002. [37] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky Pericles Loucopoulos, and Manfred A. Jeusfeld, editors, Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, August 25-29, 1997, pages 426-435. Morgan Kaufmann, 1997. [38] Tzi cker Chiueh. Content-based image indexing. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, September 12-15, 1994, pages 582-593. Morgan Kaufmann, 1994. [39] Arturo Crespo and Hector Garcia-Molina. Routing indices for peer-to-peer systems. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS 2002), pages 1-23, Washington, DC, USA, 2002. IEEE Computer Society. [40] Arturo Crespo and Hector Garcia-Molina. Semantic overlay networks for p2p systems. In Proceedings of the 3rd International Workshop on Agents and Peer-to-Peer Computing (AP2PC 2004), New York, NY, USA, July 19, 2004, Lecture Notes in Computer Science, pages 1-13. Springer, 2004. BIBLIOGRAPHY 31 [41] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Comm. ACM, 51(1):107-113, 2008. [42] Frank K. H. A. Dehne and Hartmut Noltemeier. Voronoi trees and clustering problems. Information Systems (IS 1987), 12(2):171-175,1987. [43] Vlastislav Dohnal, Claudio Gennaro, Pasquale Savino, and Pavel Zezula. D-Index: Distance searching index for metric data sets. Multimedia Tools and Applications, 21(l):9-33, 2003. [44] Vlastislav Dohnal, Claudio Gennaro, and Pavel Zezula. Computational Intelligence in Medical Informatics, chapter Efficiency and Scalability Issues in Metric Access Methods, pages 235-264. Springer-Verlag Berlin Heidelberg, Berlin, Germany, 1 edition, 2008. [45] Vlastislav Dohnal and Jan Sedmidubsky. Query routing mechanisms in self-organizing search systems. In 2nd International Workshop on Similarity Search and Applications (SISAP 2009), pages 132-139, Los Alamitos, CA 90720-1314, 2009. IEEE Computer Society. [46] Vlastislav Dohnal, Jan Sedmidubsky, Pavel Zezula, and David Novak. Similarity searching: Towards bulk-loading peer-to-peer networks. In 1st International Workshop on Similarity Search and Applications (SISAP 2008), pages 87-94, Los Alamitos CA, Washington, Tokyo, 2008. IEEE Computer Society. [47] Vlastislav Dohnal and Pavel Zezula. Similarity searching in structured and unstructured p2p networks. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 400-416, 2009. [48] Vlastislav Dohnal and Pavel Zezula. Real-life performance of metric searching. SIGSPATIAL Special, 2(2):28-31,2010. [49] Christos Doulkeridis, Akrivi Vlachou, Yannis Kotidis, and Michalis Vazirgian-nis. Peer-to-peer similarity search in metric spaces. In Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, and Erich J. Neuhold, editors, VLDB 2007: 33rd International Conference on Very Large Data Bases, September 23-27 2007, University of Vienna, Austria, pages 986-997. ACM, 2007. [50] Andrea Esuli. PP-Index: Using permutation prefixes for efficient and scalable approximate similarity search. In Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 2009), pages 17-24, 2009. [51] Ronald Fagin. Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Symposium on Principles of Database Systems (PODS 1996), Montreal, Canada, June 3-5,1996, pages 216-226. ACM Press, 1996. [52] Fabrizio Falchi, Claudio Gennaro, and Pavel Zezula. A content-addressable network for similarity search in metric spaces. In Databases, Information Systems, and 32 BIBLIOGRAPHY Peer-to-Peer Computing, International Workshops, DBISP2P 2005/2006, Trondheim, Norway, August 28-29, 2005, Seoul, Korea, September II, 2006, Revised Selected Papers, Lecture Notes in Computer Science, pages 98-110. Springer, August 2007. [53] Christos Faloutsos, Ron Barber, Myron Flickner, James L. Hafner, Wayne Niblack, Dragutin Petkovic, and William Equitz. Efficient and effective querying by image content, journal of Intelligent Information Systems (JUS 1994), 3(3/4):231-262,1994. [54] Martin A. Fischler and Robert C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381-395,1981. [55] Volker Gaede and Oliver Günther. Multidimensional access methods. ACM Computing Surveys (CSUR 1998), 30(2):170-231,1998. [56] Claudio Gennaro, Matteo Mordacchini, Salvatore Orlando, and Fausto Rabitti. MRoute: A peer-to-peer routing index for similarity search in metric spaces. In Proceedings of the 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007), pages 1-12, September 2007. [57] James L. Hafner, Harpreet S. Sawhney, William Equitz, Myron Flickner, and Wayne Niblack. Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 1995), 17(7):729-736, July 1995. [58] Gisli R. Hjaltason and Hanan Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517-580,2003. [59] Tomas Homola, Vlastislav Dohnal, and Pavel Zezula. Proximity-based order-respecting intersection for searching in image databases. In 8th International Workshop on Adaptive Multimedia Retrieval, AMR'20I0, Lira, 2010. [60] Tomas Homola, Vlastislav Dohnal, and Pavel Zezula. Sub-image searching through intersection of local descriptors. In 3rd International Conference on Similarity Search and Applications (SISAP 2010), pages 127-128, New York, 2010. ACM Press. [61] Edwin H. Jacox and Hanan Samet. Metric space similarity joins. ACM Trans. Database Syst., 33(2):l-38,2008. [62] H. V. Jagadish, Alberto O. Mendelzon, and Tova Milo. Similarity-based queries. In Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1995), pages 36-45. ACM Press, 1995. [63] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, and Rui Zhang. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS 2005), 30(2):364-397, 2005. [64] Alexis Joly and Olivier Buisson. Logo retrieval with a contrario visual query expansion. In Proceedings of the seventeen ACM international conference on Multimedia (MM 2009), pages 581-584, New York, NY, USA, 2009. ACM. BIBLIOGRAPHY 33 [65] Caetano Traina Jr., Agma J. M. Traina, Bernhard Seeger, and Christos Faloutsos. Slim-Trees: High performance metric trees minimizing overlap between nodes. In Carlo Zaniolo, Peter C. Lockemann, Marc H. Scholl, and Torsten Grust, editors, Proceedings of the 7th International Conference on Extending Database Technology (EDBT 2000), Konstanz, Germany, March 27-31, 2000, Lecture Notes in Computer Science, pages 51-65. Springer, 2000. [66] Iraj Kalantari and Gerard McDonald. A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering (TSE 1983), 9(5):631-634,1983. [67] Yan Ke and Rahul Sukthankar. Pea-sift: A more distinctive representation for local image descriptors. In Proceedings of Computer Vision and Pattern Recognition (CVPR 2004), pages 506-513. IEEE Computer Society, 2004. [68] Yan Ke, Rahul Sukthankar, and Larry Huston. Efficient near-duplicate detection and sub-image retrieval. In ACM Multimedia, pages 869-876, 2004. [69] John L. Kelly. General Topology. D. Van Nostrand, New York, 1955. [70] Teuvo Kohonen. Self-Organization and Associative Memory. Springer, 1984. [71] V.l. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1:8-17,1965. [72] Alessandro Linari and Marco Patella. Metric overlay networks: Processing similarity queries in P2P databases. In Proceedings of the 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007), pages 1-8, September 2007. [73] Witold Litwin, Marie-Anna Neimat, and Donovan A. Schneider. LH* - A scalable, distributed data structure. ACM Transactions on Database Systems, 21(4):480-525,1996. [74] Jakub Lokoc and Tomas Skopal. On reinsertions in m-tree. In IEEE 24th International Conference on Data Engineering Workshop (ICDE Workshops 2008), pages 410-417, apr. 2008. [75] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases (VLDB 2007), pages 950-961. VLDB Endowment, 2007. [76] Margarida Mamede. Recursive lists of clusters: A dynamic data structure for range queries in metric spaces. In Pinar Yolum, Tunga Güngör, Fikret Gürgen, and Can Özturan, editors, Computer and Information Sciences (ISCIS 2005), volume 3733 of Lecture Notes in Computer Science, pages 843-853. Springer Berlin / Heidelberg, 2005. 10.1007/11569596_86. [77] Luisa Micö, Jose Oncina, and Rafael C. Carrasco. A fast branch & bound nearest neighbour classifier in metric spaces. Pattern Recognition Letters, 17(7):731-739, 1996. 34 BIBLIOGRAPHY [78] Luisa Mico, Jose Oncina, and Enrique Vidal. An algorithm for finding nearest neighbors in constant average time with a linear space complexity. In Proceedings of the 11th International Conference on Pattern Recognition (ICPR 1992), The Hague, The Netherlands, pages 557-560,1992. [79] Luisa Mico, Jose Oncina, and Enrique Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters, 15(1):9-17,1994. [80] Peter R. Monge and Noshir S. Contractor. Theories of Communication Networks. Oxford University Press, April 2003. [81] Arnoldo Jose Muller-Molina and Takeshi Shinohara. On the configuration of the similarity search data structure d-index for high dimensional objects. In David Taniar, Osvaldo Gervasi, Beniamino Murgante, Eric Pardede, and Bernady Ap-duhan, editors, Computational Science and Its Applications (ICCSA 2010), volume 6018 of Lecture Notes in Computer Science, pages 443-457. Springer Berlin / Heidelberg, 2010. 10.1007/978-3-642-12179-1.37. [82] Gonzalo Navarro. Searching in metric spaces by spatial approximation. In Proceedings of the 6th International Symposium on String Processing and Information Retrieval (SPIRE 1999), Cancun, Mexico, September 21-24,1999, pages 141-148. IEEE Computer Society, 1999. [83] Gonzalo Navarro. A guided tour to approximate string matching. ACM Computing Surveys (CSUR 2001), 33(l):31-88,2001. [84] Gonzalo Navarro. Searching in metric spaces by spatial approximation. The VLDB Journal, ll(l):28-46, 2002. [85] Gonzalo Navarro and Nora Reyes. Dynamic spatial approximation trees. /. Exp. Algorithmics, 12:1-68, 2008. [86] Gonzalo Navarro and Nora Reyes. Dynamic spatial approximation trees for massive data. International Workshop on Similarity Search and Applications (SISAP 2009), pages 81-88, 2009. [87] Hartmut Noltemeier, Knut Verbarg, and Christian Zirkelbach. A data structure for representing and efficient querying large scenes of geometric objects: MB* Trees. In Geometric Modelling, Computing Supplement, pages 211-226. Springer, 1992. [88] Hartmut Noltemeier, Knut Verbarg, and Christian Zirkelbach. Monotonous Bisector* Trees - a tool for efficient partitioning of complex scenes of geometric objects. In Data Structures and Efficient Algorithms, Lecture Notes in Computer Science, pages 186-203. Springer, 1992. [89] David Novak. Similarity Search on a Very Large Scale. PhD thesis, Masaryk University, 2008. BIBLIOGRAPHY 35 [90] David Novak and Michal Batko. Metric index: An efficient and scalable solution for similarity search. International Workshop on Similarity Search and Applications (SISAP 2009), pages 65-73, 2009. [91] David Novak, Michal Batko, Vlastislav Dohnal, and Pavel Zezula. Scaling up the image content-based retrieval. In Second DELOS Conference on Digital Libraries, pages 1-10, Pisa, Italy, 2007. Information Society Technologies. [92] David Novak and Pavel Zezula. M-Chord: A scalable distributed similarity search structure. In Proceedings of First International Conference on Scalable Information Systems (INFOSCALE 2006), Hong Kong, May 30 - June 1,2006, pages 1-10, New York, NY, USA, 2006. ACM Press. [93] Rodrigo Paredes and Nora Reyes. List of twin clusters: A data structure for similarity joins in metric spaces. Similarity Search and Applications, International Workshop on, pages 131-138, 2008. [94] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Schenker. A scalable content-addressable network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2001), San Diego, California, August 27-31, 2001, pages 161-172. ACM Press, 2001. [95] Gerhard Roth and William Scott. Efficient indexing for strongly similar subim-age retrieval. In Proceedings of the Fourth Canadian Conference on Computer and Robot Vision (CRV2007), pages 440-447, Washington, DC, USA, 2007. IEEE Computer Society. [96] Min-Sung Ryu, Soo-Jun Park, and Chee Sun Won. Image retrieval using sub-image matching in photos using mpeg-7 descriptors. In Proceedings of Asia Information Retrieval Society Conference (AIRS 2005), Lecture Notes in Computer Science, pages 366-373. Springer, 2005. [97] Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. [98] Jan Sedmidubsky, Stanislav Barton, Vlastislav Dohnal, and Pavel Zezula. Querying similarity in metric social networks. In Network-Based Information Systems, First International Conference, NBiS 2007, number vol. 4658 in Lecture Notes in Computer Science, page 278, Berlin, 2007. Springer. [99] Jan Sedmidubsky, Stanislav Barton, Vlastislav Dohnal, and Pavel Zezula. Adaptive approximate similarity searching through metric social networks. In 24th International Conference on Data Engineering (ICDE 2008), pages 1424-1426, Los Alamitos CA, 2008. IEEE Computer Society. [100] Jan Sedmidubsky, Vlastislav Dohnal, Stanislav Barton, and Pavel Zezula. A self-organized system for content-based search in multimedia. In IEEE International Symposium on Multimedia (ISM 2008), pages 322-327, Los Alamitos, California 90720-1314, 2008. IEEE Computer Society. 36 BIBLIOGRAPHY [101] Jan Sedmidubský, Vlastislav Dohnal, and Pavel Zezula. Feedback-based performance tuning for self-organizing multimedia retrieval systems. In International Conference on Advances in Multimedia (MMEDIA 2010), pages 102-108, Los Alamitos, CA 90720-1314, 2010. IEEE Computer Society. [102] Jan Sedmidubský, Vlastislav Dohnal, and Pavel Zezula. On building a self-organizing search system for multimedia retrieval. In International Workshop on Multimedia and Semantic Technologies (MUST 2010), Red Hook, NY 12571, USA, 2010. [103] Thomas Seidl and Hans-Peter Kriegel. Efficient user-adaptable similarity search in large multimedia databases. In The VLDB Journal, pages 506-515,1997. [104] Dennis Shasha and James Z. Wang. New techniques for best-match retrieval. ACM Transactions on Information Systems (TOIS 1990), 8(2):140-158,1990. [105] Tomáš Skopal. Pivoting M-tree: A metric access method for efficient similarity search. In Václav Snášel, Jaroslav Pokorný, and Karel Richta, editors, Proceedings of the Dateso 2004 Annual International Workshop on DAtabases, TExts, Specifications and Objects, Desna, Czech Republic, April 14-16, 2004, CEUR Workshop Proceedings, pages 27-37. CEUR-WS.org, 2004. [106] Tomáš Skopal, Vlastislav Dohnal, Michal Batko, and Pavel Zezula. Distinct nearest neighbors queries for similarity search in very large multimedia databases. In 11th ACM International Workshop on Web Information and Data Management (WIDM 2009), pages 11-14, New York, USA, 2009. ACM. [107] Tomáš Skopal and Jakub Lokoč. Nm-tree: Flexible approximate similarity search in metric and non-metric spaces. In Sourav Bhowmick, Josef Küng, and Roland Wagner, editors, Database and Expert Systems Applications, volume 5181 of Lecture Notes in Computer Science, pages 312-325. Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-85654-2.30. [108] Tomáš Skopal, Jaroslav Pokorný, Michal Krátký, and Václav Snášel. Revisiting M-Tree building principles. In Leonid A. Kalinichenko, Rainer Manthey, Bern-hard Thalheim, and Uwe Wloka, editors, Proceedings of the 7th East European Conference on Advances in Databases and Information Systems (ADBIS 2003), Dresden, Germany, September 3-6, 2003, Lecture Notes in Computer Science. Springer, 2003. [109] Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, Frans M. Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, ll(l):17-32, 2003. [110] Jeffrey K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4): 175-179,1991. [Ill] Enrique Vidal. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters, 4(3):145-157,1986. BIBLIOGRAPHY 37 [112] Enrique Vidal. New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA). Pattern Recognition Letters, 15(l):l-7,1994. [113] Stanley Wasserman, Katherine Faust, and Dawn Iacobucci. Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, November 1994. [114] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393(6684):440-442, June 1998. [115] Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the 4th Annual ACM Symposium on Discrete Algorithms (SODA 1993), Austin, Texas, USA, January 25-27, 1993, pages 311-321. ACM Press, 1993. [116] Peter N. Yianilos. Excluded middle vantage point forests for nearest neighbor search. In Proceedings of the 6th DIMACS Implementation Challenge: Near Neighbor Searches (ALENEX 1999), Baltimore, Maryland, USA, January 15-16,1999,1999. [117] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. Springer, 2006. [118] Pavel Zezula, Michal Batko, and Vlastislav Dohnal. Encyclopedia of Database Systems, chapter Indexing Metric Spaces, pages 1-4. Database Management and Information Retrieval. Springer-Verlag, New York, 2009. [119] Pavel Zezula, Michal Batko, Vlastislav Dohnal, David Novak, and Jan Sed-midubsky. Similarity search in large collections of biometric data. In NATO RTO Modelling and Simulation Group Symposium, pages 1-13, Brussels, 2009. [120] Pavel Zezula, Vlastislav Dohnal, and Michal Batko. Encyclopedia of Computer Science and Engineering, chapter File Organizations, pages 1219-1227. Wiley-Interscience, Hoboken, NJ, USA, 2009. [121] Pavel Zezula, Vlastislav Dohnal, and David Novak. Global Data Management, chapter Towards Scalability of Similarity Searching, pages 277-300. IOS Press, Amsterdam, The Netherlands, 2006. Part II Collection of Articles 39 Chapter 9 Books 1. Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. Springer, 2006. 41 42 CHAPTER 9. BOOKS Due to copyrights pending, the contents of papers are not included. Chapter 10 Book Chapters This chapter contains the list of book chapters including in this habilitation thesis. The individual contributing articles that follow after are titled as the original book chapters and not as the book they are published in. 1. Stanislav Barton, Vlastislav Dohnal, Jan Sedmidubsky, and Pavel Zezula. Computational Social Network Analysis: Trends, Tools and Research Advances, chapter Towards Self-organizing Search Systems, pages 49-80. Computer Communications and Networks. Springer, New York, NY, USA, 2010. 2. Pavel Zezula, Michal Batko, and Vlastislav Dohnal. Encyclopedia of Database Systems, chapter Indexing Metric Spaces, pages 1-4. Database Management and Information Retrieval. Springer-Verlag, New York, 2009. 3. Pavel Zezula, Vlastislav Dohnal, and Michal Batko. Encyclopedia of Computer Science and Engineering, chapter File Organizations, pages 1219-1227. Wiley-Interscience, Hoboken, NJ, USA, 2009. 4. Vlastislav Dohnal, Claudio Gennaro, and Pavel Zezula. Computational Intelligence in Medical Informatics, chapter Efficiency and Scalability Issues in Metric Access Methods, pages 235-264. Springer-Verlag Berlin Heidelberg, Berlin, Germany, 1 edition, 2008. 5. Pavel Zezula, Vlastislav Dohnal, and David Novak. Global Data Management, chapter Towards Scalability of Similarity Searching, pages 277-300. IOS Press, Amsterdam, The Netherlands, 2006. 43 44 CHAPTER 10. BOOK CHAPTERS Due to copyrights pending, the contents of papers are not included. Chapter 11 Journals 1. Vlastislav Dohnal and Pavel Zezula. Real-life performance of metric searching. SIGSPATIAL Special, 2(2):28-31, 2010. 2. Vlastislav Dohnal and Pavel Zezula. Similarity searching in structured and unstructured p2p networks. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 400-416, 2009. 45 46 CHAPTER 11. JOURNALS Due to copyrights pending, the contents of papers are not included. Chapter 12 Conference Papers 1. Tomáš Homola, Vlastislav Dohnal, and Pavel Zezula. Proximity-based order-respecting intersection for searching in image databases. In 8th International Workshop on Adaptive Multimedia Retrieval, AMR'2010, Linz, 2010. 2. Tomáš Homola, Vlastislav Dohnal, and Pavel Zezula. Sub-image searching through intersection of local descriptors. In 3rd International Conference on Similarity Search and Applications (SISAP 2010), pages 127-128, New York, 2010. ACM Press. 3. Jan Sedmidubský, Vlastislav Dohnal, and Pavel Zezula. Feedback-based performance tuning for self-organizing multimedia retrieval systems. In International Conference on Advances in Multimedia (MMEDIA 2010), pages 102-108, Los Alami-tos, CA 90720-1314, 2010. IEEE Computer Society. 4. Jan Sedmidubský, Vlastislav Dohnal, and Pavel Zezula. On building a self-organizing search system for multimedia retrieval. In International Workshop on Multimedia and Semantic Technologies (MUST 2010), Red Hook, NY 12571, USA, 2010. 5. Michal Batko, Vlastislav Dohnal, David Novák, and Jan Sedmidubský. Mufin: A multi-feature indexing network. In Proceedings of the Second International Workshop on Similarity Search and Applications (SISAP 2009), pages 158-159, Washington, DC, USA, 2009. IEEE Computer Society. 6. Vlastislav Dohnal and Jan Sedmidubský. Query routing mechanisms in self-organizing search systems. In 2nd International Workshop on Similarity Search and Applications (SISAP 2009), pages 132-139, Los Alamitos, CA 90720-1314, 2009. IEEE Computer Society. 7. Tomáš Skopal, Vlastislav Dohnal, Michal Batko, and Pavel Zezula. Distinct nearest neighbors queries for similarity search in very large multimedia databases. In 11th ACM International Workshop on Web Information and Data Management (WIDM 2009), pages 11-14, New York, USA, 2009. ACM. 8. Pavel Zezula, Michal Batko, Vlastislav Dohnal, David Novák, and Jan Sedmidubský. Similarity search in large collections of biometric data. In NATO RTO Modelling and Simulation Group Symposium, pages 1-13, Brussels, 2009. 47 48 CHAPTER 12. CONFERENCE PAPERS 9. Vlastislav Dohnal, Jan Sedmidubský, Pavel Zezula, and David Novák. Similarity searching: Towards bulk-loading peer-to-peer networks. In 1st International Workshop on Similarity Search and Applications (SISAP 2008), pages 87-94, Los Alamitos CA, Washington, Tokyo, 2008. IEEE Computer Society. 10. Jan Sedmidubský, Stanislav Bartoň, Vlastislav Dohnal, and Pavel Zezula. Adaptive approximate similarity searching through metric social networks. In 24th International Conference on Data Engineering (ICDE 2008), pages 1424-1426, Los Alamitos CA, 2008. IEEE Computer Society. 11. Jan Sedmidubský, Vlastislav Dohnal, Stanislav Bartoň, and Pavel Zezula. A self-organized system for content-based search in multimedia. In IEEE International Symposium on Multimedia (ISM 2008), pages 322-327, Los Alamitos, California 90720-1314, 2008. IEEE Computer Society. 12. Stanislav Bartoň, Vlastislav Dohnal, Jan Sedmidubský, and Pavel Zezula. Building self-organized image retrieval network. In Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval (LSDS-1R'08), pages 51-58, USA, 2008. ACM. 13. Jan Sedmidubský, Stanislav Bartoň, Vlastislav Dohnal, and Pavel Zezula. Querying similarity in metric social networks. In Network-Based Information Systems, First International Conference, NBiS 2007, number vol. 4658 in Lecture Notes in Computer Science, page 278, Berlin, 2007. Springer. 14. Stanislav Bartoň, Vlastislav Dohnal, Jan Sedmidubský, and Pavel Zezula. Gauging the evolution of metric social network. In 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007) held at 33rd International Conference on Very Large Data Bases (VLDB 2007), pages 1-12, Vienna, 2007. VLDB Endowment. 15. David Novák, Michal Batko, Vlastislav Dohnal, and Pavel Zezula. Scaling up the image content-based retrieval. In Second DELOS Conference on Digital Libraries, pages 1-10, Pisa, Italy, 2007. Information Society Technologies. 16. Michal Batko, Vlastislav Dohnal, and Pavel Zezula. M-grid: Similarity searching in grids. In Proceedings of International Workshop on Information Retrieval in Peer-to-Peer Networks, ACM CIKM 2006, pages 17-24, Arlington, 2006. ACM Press. Due to copyrights pending, the contents of papers are not included.