DISA at ImageCLEF 2014 Annotation Task: results, better results, and future work Petra Budíková, Michal Batko DISA seminar, 30. 9. 2014 Outline §A brief history of MUFIN Annotations §DISA at ImageCLEF 2014 §ImageCLEF Scalable Concept Annotation Task §DISA solution §Competition results §DISA annotation with DeCAF §New results §Others at ImageCLEF §Future work 2/19 A brief history of MUFIN Annotations §Sometime in 2010: §We now have a reasonably working image search in large collections. How about using it for search-based image annotation? §2011: §Budikova, Batko, Zezula: Online Image Annotation. Demo SISAP 2011. §Very first implementation – take top N most similar images, return top K most frequent words from their descriptions. Merge synonyms using WordNet. §Budikova, Batko, Zezula: MUFIN at ImageCLEF 2011: Success or Failure?. ImageCLEF 2011. §Basic annotation implementation combined with face recognition and EXIF tag processing. §Ranked 13th out of 18 participants. Others mostly used machine learning, which was well applicable since manually preprocessed training data was available. §2012: §MUFIN Image Annotation software – extension for Firefox §Provides keyword annotation for arbitrary web image. Still the basic frequency-based annotation. § 3/19 A brief history of MUFIN Annotations (cont.) §2013: §Batko, Botorek, Budikova, Zezula: Content-Based Annotation and Classification Framework: A General Multi-Purpose Approach. IDEAS 2013 §A new generic architecture for search-based annotation processing. Multiple modules can be combined to create, expand and clean the annotation. §2014 §Batko, Budikova, Elias Zezula: CLAN Photo Presenter: Multi-modal Summarization Tool for Image Collections. Demo ICMR 2014 §Annotation tool used to assign keyword summaries to image clusters. §Budikova, Botorek, Batko, Zezula: DISA at ImageCLEF 2014: The search-based solution for scalable image annotation. ImageCLEF 2014 §Exploits a new idea of conceptRank to estimate the probabilities of individual candidate concepts. §Ranked 5th out of 11 participants. §Budikova, Botorek, Batko, Zezula: DISA at ImageCLEF 2014 Revised: Search-based Image Annotation with DeCAF Features. Technical Report. §Annotations with conceptRank and DeCAF features. §Would have ranked 2nd out of 11 participants! § 4/19 ImageCLEF 2014 Scalable Image Annotation Task §Annotation task definition §Input: image + set of candidate concepts (40 to 207) §Expected result: set of relevant concepts § § § § § § § § §2 datasets §Development data: 1940 images, ground truth available §Test data: 7291 images, ground truth not available § § § http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/-g/-gtLV2J8mRnRuo9Q.jpg aerial airplane baby beach bicycle bird boat bridge building car cartoon castle cat chair child church cityscape closeup cloud cloudless coast countryside daytime desert diagram dog drink drum elder embroidery fire firework fish flower fog food footwear furniture garden grass guitar harbor hat helicopter highway horse indoor instrument lake lightning logo monument moon motorcycle mountain nighttime overcast painting park person plant portrait protest rain rainbow reflection river road sand sculpture sea shadow sign silhouette smoke snow soil space spectacles sport sun sunrise/sunset table teenager toy traffic train tricycle truck underwater unpaved wagon water 5/19 ImageCLEF 2014 evaluation metrics 6/19 Our solution at ImageCLEF 2014 §Search-based annotation with utilization of semantic relationships defined by WordNet § 7/19 Our solution (cont.) §Image datasets for similarity-based searching: §Profiset: 20M images with high-quality keywords §Dataset provided by ImageCLEF organizers (“SCIA trainset”): 500K images from internet, descriptions more noisy, but covers all topics in the contest § §Image content extraction: §Combination of 5 MPEG7 global features § §Exploitation of semantic relationships: §Synonyms §Probability ranking of possible meanings of each word §Hypernymy/hyponymy §Holonymy/meronymy 8/19 Other group approaches (first three) §KDEVIR - Computer Science and Engineering department of the Toyohashi University of Technology (Aichi, Japan) §Used features provided by organizers §Automatic ontology built per concept using WordNet and Wikipedia §Training positive and negative samples selected by exploiting ontologies § §MIL - Machine Intelligence Lab of the University of Tokyo (Tokyo, Japan). §Used combination of various descriptors, including FisherVectors & DeCAF §Linear multi-label classifier by machine learning § §MindLab - Machine learning, perception and discovery Lab from the Universidad Nacional de Colombia (Bogotá, Columbia) §Used DeCAF features §A logistic regression (soft-max) mode machine learning to classification § 9/19 ImageCLEF 2014 results §Our solution ranked 5th of the 11 groups §We are more successful in “sample” metrics §The “concept” metric require that we find the § § § System MAP-samples MF-samples MF-concepts all ani. food 207 all ani. food 207 all ani. food 207 unseen KDEVIR 9 36.8 33.1 67.1 28.9 37.70 29.9 64.9 32.0 54.7 67.1 65.1 31.6 66.1 MIL 3 36.9 30.9 68.6 23.3 27.50 20.6 53.1 18.0 34.7 34.7 50.4 16.9 36.7 MindLab 1 37.0 43.1 63.0 22.1 25.80 17.0 45.2 18.3 30.7 35.1 35.3 16.7 34.7 MLIA 9 27.8 18.8 53.6 16.7 24.80 12.1 46.0 16.4 33.2 32.7 37.3 16.9 34.8 DISA 4 34.3 46.6 39.6 19.0 29.70 40.6 31.2 16.9 19.1 23.0 22.3 7.3 19.0 RUC 7 27.5 25.2 44.2 15.1 29.30 28.0 28.2 20.7 25.3 20.1 23.1 10.0 18.7 IPL 9 23.4 30.0 48.5 18.9 18.40 20.2 29.8 17.5 15.8 15.8 33.3 12.5 22.0 IMC 1 25.1 35.7 35.6 12.9 16.30 14.3 21.0 10.9 12.5 10.2 15.1 6.1 11.2 INAOE 5 9.6 6.9 15.0 8.5 5.30 0.4 0.5 6.4 10.3 1.0 0.8 17.9 19.0 NII 1 14.7 23.2 22.0 4.6 13.00 18.9 18.7 4.9 2.3 3.0 2.1 0.9 1.8 FINKI 1 6.9 N/A N/A N/A 7.20 8.1 12.3 4.1 4.7 6.3 9.0 2.9 4.7 10/19 New features for image retrieval §DeCAF7 visual features §Utilization of deep convolutional network §Outperformed all participants at ImageNet large scale visual recognition challenge ILSVRC-2012 (Krizhevsky et. al. 2012) §Adopted as visual descriptor (Donahue et. al. 2013) §Result from the last hidden layer used as 4096-dimensional visual descriptor §Similarity using classical Lp metric §Gives better results than traditional features on benchmarks from other domains § §Easily used by our similarity-search framework §PPP-Codes technique able to index 20M collection of data §Real-time response on a common server hardware §8 cores, 8GB RAM, 256GB SSD § §Improved results of our annotation! 11/11 11/19 Evaluation results §Development data § § § § § § § • §Test data mP-concept mR-concept mF-concept mP-sample mR-sample mF-sample mAP-sample Baseline (random) 0.0775 0.0641 0.0498 0.0730 0.0969 0.0722 0.1578 DISA-best with MPEG and Profiset data 0.2954 0.2746 0.2184 0.3044 0.4516 0.3352 0.4268 DISA-best with MPEG and Profiset+SCIA data 0.2919 0.2778 0.2202 0.3052 0.4533 0.3369 0.4281 DISA-best with DeCAF and Profiset data 0.4768 0.4899 0.4165 0.4466 0.6152 0.4825 0.6105 DISA-best with DeCAF and Profiset+SCIA data 0.4928 0.5085 0.4315 0.4534 0.6252 0.4901 0.6196 mF-concept mF-sample mAP-sample Baseline (random) 0.026 0.035 0.088 DISA-best with MPEG and Profiset data 0.154 0.279 0.316 DISA-best with MPEG and Profiset+SCIA data 0.191 0.297 0.343 Competition best 0.547 0.377 0.368 DISA-best with DeCAF and Profiset+SCIA data 0.411 0.399 0.486 Evaluated by ImageCLEF organizers as a favor after competition deadline 12/19 New result evaluation – details § § § § § § mF-concept mF-sample mAP-sample DISA-MU 04 (DISA best in competition) 19.1 [17.5–21.8] 29.7 [29.2–30.3] 34.3 [33.8–35.0] KDEVIR 09 (competition winner) 54.7 [50.9–58.3] 37.7 [37.0–38.5] 36.8 [36.1–37.5] DISA-MU NEW 41.1 [38.3–44.2] 39.9 [39.3–40.5] 48.6 [47.9–49.3] System MAP-samples MF-samples MF-concepts all ani. food 207 all ani. food 207 all ani. food 207 unseen KDEVIR 9 36.8 33.1 67.1 28.9 37.70 29.9 64.9 32.0 54.7 67.1 65.1 31.6 66.1 DISA NEW 48.6 51.0 67.2 32.3 39.90 44.4 48.5 26.7 41.1 N/A N/A N/A 44.9 MIL 3 36.9 30.9 68.6 23.3 27.50 20.6 53.1 18.0 34.7 34.7 50.4 16.9 36.7 MindLab 1 37.0 43.1 63.0 22.1 25.80 17.0 45.2 18.3 30.7 35.1 35.3 16.7 34.7 MLIA 9 27.8 18.8 53.6 16.7 24.80 12.1 46.0 16.4 33.2 32.7 37.3 16.9 34.8 DISA 4 34.3 46.6 39.6 19.0 29.70 40.6 31.2 16.9 19.1 23.0 22.3 7.3 19.0 RUC 7 27.5 25.2 44.2 15.1 29.30 28.0 28.2 20.7 25.3 20.1 23.1 10.0 18.7 IPL 9 23.4 30.0 48.5 18.9 18.40 20.2 29.8 17.5 15.8 15.8 33.3 12.5 22.0 IMC 1 25.1 35.7 35.6 12.9 16.30 14.3 21.0 10.9 12.5 10.2 15.1 6.1 11.2 INAOE 5 9.6 6.9 15.0 8.5 5.30 0.4 0.5 6.4 10.3 1.0 0.8 17.9 19.0 NII 1 14.7 23.2 22.0 4.6 13.00 18.9 18.7 4.9 2.3 3.0 2.1 0.9 1.8 FINKI 1 6.9 N/A N/A N/A 7.20 8.1 12.3 4.1 4.7 6.3 9.0 2.9 4.7 13/19 Results illustration – the top 5 concepts (DeCAF-best method, random selection of queries) http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/-g/-gtLV2J8mRnRuo9Q.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/1c/1Cear3rT3BkcYmHZ.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/1_/1_4HQaUUVBzXf-Ld.jpg Sunrise/sunset, water, beach, sea, coast Missing: cloud, reflection, sand, sky, sun http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/3j/3JO2X2A5D1nBvTuA.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/mn/mN0z3vOSj3gQqF3S.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/oa/oaYw5NIPG3ntCnLd.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/pf/PFgeYp5m7lUHjAMe.jpg http://andromeda.fi.muni.cz/%7Exkohout7/ImageCLEF2014/data/dev_images/2x/2xZmZFKAnRAI1xQN.jpg 14/19 -gtLV2J8mRnRuo9Q, 1Cear3rT3BkcYmHZ, 1_4HQaUUVBzXf-Ld, mN0z3vOSj3gQqF3S, 3JO2X2A5D1nBvTuA, oaYw5NIPG3ntCnLd, PFgeYp5m7lUHjAMe, 2xZmZFKAnRAI1xQN DeCAF vs. MPEG §Out of 1940 development queries §APMPEG is higher than APDeCAF in 357 cases §PrecisionMPEG is higher than PrecisionDeCAF in 201 cases §RecallMPEG is higher than RecallDeCAF in 158 cases § §When MPEG results are better, typically §the query image is difficult §neither MPEG nor DeCAF provide good results §MPEG-based results often better by small margin §MPEG-based results often probably better by chance § §With very few exceptions, DeCAF-based visual similarity is better 15/19 DeCAF vs. MPEG (cont.) http://andromeda.fi.muni.cz/~xkohout7/ImageCLEF2014/data/dev_images/xt/xTmOlLg4LsnO09sr.jpg MPEG7 descriptors DeCAF7 descriptors query §Two examples where MPEG visual similarity is better §Interesting for further study? 16/19 xTmOlLg4LsnO09sr.jpg Caffe vs. MPEG (cont.) http://andromeda.fi.muni.cz/~xkohout7/ImageCLEF2014/data/dev_images/nw/Nwq5x1MXfivYJMcU.jpg MPEG7 descriptors DeCAF7 descriptors query 17/19 Nwq5x1MXfivYJMcU Conclusions §Presented modular architecture of DISA annotation tool §allows easy replacement of any component § §Our approach is based on nearest-neighbor search not training §completely scalable – crawled data can be directly indexed §no need for ground truth §generic vocabulary (keyword) annotation – no need to hit predefined classes § §New visual similarity by DeCAF features §The new similarity-search component enabled us to increase the quality of annotations by approximately 10-20 % (depending on the quality measure) §New DISA results outperform the best results submitted to ImageCLEF 2014 Annotation Challenge in 2 out of 3 quality measures 18/11 18/19 Future work §With CVUT: other descriptors/neural network descriptors trained on different data §Refinement of conceptRank algorithm §Relevance feedback § §Experiments with other queries+GT § §Journal paper (by December) 19/19