Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Jan Sedmidubsky Vlastislav Dohnal Pavel Zezula On Investigating Scalability and Robustness in a Self-organizing Retrieval System Faculty of Informatics Masaryk University Brno, Czech Republic fi-logo 1/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Outline •Motivation •Metric Social Network –Architecture –Query Routing •Experimental Trials –Scalability –Adaptability –Robustness •Conclusions 2/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Motivation •Digital data explosion –100 million new photos uploaded to Facebook everyday –30 hours of videos uploaded to YouTube every minute – – Þ data must be efficiently stored, shared, and searched 1) 1) 1) 1) C:\Program Files\Microsoft Office 2007\MEDIA\CAGCAT10\j0195384.wmf Facebook YouTube C:\Program Files\Microsoft Office 2007\MEDIA\CAGCAT10\j0195384.wmf C:\Program Files\Microsoft Office 2007\MEDIA\CAGCAT10\j0195384.wmf C:\Program Files\Microsoft Office 2007\MEDIA\CAGCAT10\j0195384.wmf ~ unstructured P2P networks 3/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Motivation •Our objective – to develop an engine for efficient search in unstructured P2P networks • •Problems: –Scalability – a large number of peers –Volatility – continual peers’ churning • Þ self-organizing systems –Similarity – complex data domains • Þ metric space 4/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Similarity Search: Metric Space •Metric Space M is a pair M=(D, d), where: –D is a domain of objects –d is a function measuring similarity between two objects •Similarity range queries R(q, r) q r j0157763 j0183328 j0090386 j0212219 j0211949 j0216858 j0183328 j0216858 5/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network •Metric Social Network –A similarity search system for unstructured P2P networks •A set of peers interconnected by semantic links •Peers are independent and equal in functionality •There is no global control mechanism •Based on self-organizing principles: –Scalability –Adaptability –Robustness • •Peer’s schema: –Data repository (e.g., image features) –Routing table – 6/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Routing Table •Routing table: –Exploration peers –Query history – based on answers to the processed query •Acquaintance – peer returning the largest part of the answer •Friends – peers returning a non-empty answer Knowledge base Query history Exploration nodes N8 N10 N13 Q Friends Acq. Answer size Confidence E1 q1 r1 N7 N2 N4 N7 189 0.82 E2 q2 r2 N6 N6 13 0.95 E3 q3 r3 N1 N1 7 0.30 E4 q4 r4 N5 N3 N5 52 0.74 > Q4=(q4, r4) N1 System Q4=(q4, r4) N1 N5 N2 N3 42 10 0 Knowledge base Query history Exploration nodes N8 N10 N13 Q Friends Acq. Answer size Confidence E1 q1 r1 N7 N2 N4 N7 189 0.82 E2 q2 r2 N6 N6 13 0.95 E3 q3 r3 N1 N1 7 0.30 E4 q4 r4 N5 N3 N5 52 0.74 Routing table Query history Exploration peers N8 N10 N13 Q Friends Acq. Answer size Confidence E1 q1 r1 N7 N2 N4 N7 189 0.82 E2 q2 r2 N6 N7 N6 13 0.95 E3 q3 r3 N1 N1 7 0.30 E4 q4 r4 N5 N3 N5 52 0.74 7/15 Jan Sedmidubsky r2 r4 q3 October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System q4 Metric Social Network: Query Routing •At each peer, a query Q(q, r) is processed as follows: –Take the most relevant entries Ei to Q •Exploitation – forward Q to the acquaintances of these entries •Exploration – forward Q to a certain number of exploration peers –Routing stops when no better acquaintance exists •Evaluate Q on the local data repository •Ask all friends of the most relevant entries to evaluate Q as well Routing table Query history Exploration peers N8 N10 N13 Q Friends Acq. Answer size Confidence E1 q1 r1 N7 N2 N4 N7 189 0.82 E2 q2 r2 N6 N7 N6 13 0.95 E3 q3 r3 N1 N1 7 0.30 E4 q4 r4 N5 N3 N5 52 0.74 q1 q2 r1 r3 r q Q1 Q4 Q3 Q2 Q 8/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Experimental Evaluation •System size: 2,000 peers • •Data sets: –Synthetic – 100,000 2-d vectors –Real-life (CoPhIR image features) – 100,000 282-d vectors – Þ each peer maintains 50 data objects – •Experimenting – repeating the batch of: –Training series – 50 queries executed at random peers –Test series – 5 queries executed at predefined peers 9/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Experimental Evaluation (cont.) •Measures: –Recall [%] – ratio between the size of the answer of our system and the size of the complete answer –Total costs [# of peers] – number of peers contacted by the routing algorithm in order to process a query –Optimal costs [# of peers] – number of peers in the system that contain data relevant to a query 10/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Experimental Evaluation (cont.) •Scalability evaluation (image features) –Very high recall – almost 100% –Low costs – 50 peers (out of 2,000) contacted on average robustness-IMG-recall robustness-IMG-costs.png 11/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Experimental Evaluation (cont.) •Adaptability to data distributions (image features) –IMG – semi-clustering principle –IMG-OWN – random data distribution adaptability-IMG-costs.png adaptability-IMG-recall.png 12/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Metric Social Network: Experimental Evaluation (cont.) •Resilience to disconnections of peers (image features) –After each 20th test series: •S1 – 200 random peers were disconnected •S2 – 200 the most knowledgeable peers were disconnected • robustness-IMG-S1.png robustness-IMG-S2.png S1 S2 13/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Conclusions and Future Research Directions •Main achievements: –Prototype of Metric Social Network –Experimental evaluation of scalability, adaptability, and robustness • •Future research directions: –Advanced experiments on peers’ churning –New routing algorithms optimizing search costs 14/15 Jan Sedmidubsky October 28, 2011 Scalability and Robustness in a Self-organizing Retrieval System Questions? •Thank you for your attention. 15/15