Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ 10. 3. 2014 Annotation Tool 2 http://www.presentation-process.com/images/stack-of-photos-in-powerpoint.jpg Annotation tool + MUFIN http://pixabay.com/static/uploads/photo/2013/07/13/14/07/gear-wheel-162161_640.png - House - Nature - Forest - Tree … http://t3.gstatic.com/images?q=tbn:ANd9GcRtbCSzlzUgZ1Kcdl-pITWYuPhQPI5DVaAt0e1L5HicOjf1JgOR - WordNet - Visual Concept Ontology - MUFIN visual object search Profimedia What is ImageCLEF? •Competition regarding cross-language annotation and retrieval of images •Different areas of interest: •Robot vision – object recognition •Plant identification •Medical image identification •Concept Image Annotation •Goal: From a given set of concepts, select those that are relevant for an input image ImageCLEF - Image Retrieval in CLEF 3 Concept Image Annotation in 2014 •Focus on scalability – no manually labeled training data •Noisy training data downloaded from internet are only available •Development data – 10000 images with ground truth concepts •Participants are allowed to use no manually labeled training data that was created directly for machine learning •Profiset is OK, since it is a by-product of another activity (image selling) • 4 CLEF Annotation Task vs. our annotation tools •Our long-time research objective: general image annotation without well-labeled training data •„Big Data approach“ •Primarily focused on description of noun objects •Difficult to evaluate • •CLEF: Annotation contest with ground truth •Considers nouns, adjectives and verbs •Participants typically use model-based approaches 5 Competition’s benefits for us • •Evaluation of our tools on a well-known ground truth •Images of numerous types and situations are included in the CLEF collection as a training and development material. •Compare our ideas to solution with other teams •Utilize our current approach to the image annotation process in the practice •With a feedback from the initiators • •Long-time objective: journal paper about search-based annotation, CLEF results as part of evaluation 6 What we plan to utilize? •WordNet-based relations •Must be accustomed to the given word types (not just nouns) •WordNet-based word similarity metrics •Visual Concept Ontology •Similar Images search – powered by MUFIN •Co-occurrence relations among words within Profimedia dataset •Constructed by Institute of Formal and Applied Linguistics (MFF UK) 7 WordNet vs. co-occurence •WordNet – fundamental technology •Meanings, relations, multiple word types •Hypernymy, antonymy, part-whole, gloss… •“language” point of view • •Co-occurrence table of related words •Constructed from very large text corpora (linguists from MFF UK) •For each word that occurrs in Profiset descriptions, we have 100 most co-occurred words •No word types attached •“human/database” point of view 8 Our approach I. Overall View C:\Users\HonzaBotorek\Desktop\overview.png 9 Our approach II. Network example •distribute probabilities – PageRank style http://www.bloguismo.com/wp-content/uploads/2010/04/pagerank.jpg 10 Our approach III. The probability-transfer network •The probability-transfer coefficients of links between individual nodes are defined for different types of relations: hypernymy, synonymy, meronymy, word co-occurrence, … •E.g. Meronyms (whole -> parts): (1-l)/n • • C:\Users\HonzaBotorek\Desktop\f.png l: calibration constant 11 Our approach IV. Algorithm steps •1) Assign probability values to initial nodes •2) Build the network •Extend initial nodes by related synsets AND co-occurred words •Assign “probability-transfer coefficients” to links between nodes (determined by the type of relationship) •3) “Page-ranking” process •Run a process where synsets will mutually boost one another’s probability values •4) Select the most probable synsets 12 Our approach V: Network example 13 D:\ŠKOLA\work\ImageCLEF2014\!!!Prezentace\particular instance of the network.png Our approach VI. Unresolved issues •Calibration of probability-transfer coefficients •What constants should be used? •Initial step: assignment of initial probabilities for particular annotation words •Details of the probability transfer algorithm •Final step: Selecting of the most probable concepts • • 14 Summary •Problem •Select descriptive words of given image from a predefined set of concept words •Our approach •Construction of a network of synsets; a node (synset) influences another’s probability by mutual relations •Inspired by page-ranking algorithm •Main research objectives •Design and construct a model of the synset network •Define and calibrate relations (links) among nodes in the network (synsets) 15