Annotation Framework & ImageCLEF 2014
JAN BOTOREK, PETRA BUDÍKOVÁ
10. 3. 2014

Annotation Tool
2
http://www.presentation-process.com/images/stack-of-photos-in-powerpoint.jpg
Annotation tool + MUFIN
http://pixabay.com/static/uploads/photo/2013/07/13/14/07/gear-wheel-162161_640.png
- House
- Nature
- Forest
- Tree
…
http://t3.gstatic.com/images?q=tbn:ANd9GcRtbCSzlzUgZ1Kcdl-pITWYuPhQPI5DVaAt0e1L5HicOjf1JgOR
- WordNet
- Visual Concept Ontology
- MUFIN visual object search
Profimedia

What is ImageCLEF?
•Competition regarding cross-language annotation and retrieval of images
•Different areas of interest:
•Robot vision – object recognition
•Plant identification
•Medical image identification
•Concept Image Annotation
•Goal: From a given set of concepts, select those that are relevant for an input image
ImageCLEF - Image Retrieval in CLEF
3

Concept Image Annotation in 2014
•Focus on scalability – no manually labeled training data
•Noisy training data downloaded from internet are only available
•Development data – 10000 images with ground truth concepts
•Participants are allowed to use no manually labeled training data that was created directly for
machine learning
•Profiset is OK, since it is a by-product of another activity (image selling)
•
4

CLEF Annotation Task vs. our annotation tools
•Our long-time research objective: general image annotation without well-labeled training data
•„Big Data approach“
•Primarily focused on description of noun objects
•Difficult to evaluate
•
•CLEF: Annotation contest with ground truth
•Considers nouns, adjectives and verbs
•Participants typically use model-based approaches
5

Competition’s benefits for us
•
•Evaluation of our tools on a well-known ground truth
•Images of numerous types and situations are included in the CLEF collection as a training and
development material.
•Compare our ideas to solution with other teams
•Utilize our current approach to the image annotation process in the practice
•With a feedback from the initiators
•
•Long-time objective: journal paper about search-based annotation, CLEF results as part of
evaluation
6

What we plan to utilize?
•WordNet-based relations
•Must be accustomed to the given word types (not just nouns)
•WordNet-based word similarity metrics
•Visual Concept Ontology
•Similar Images search – powered by MUFIN
•Co-occurrence relations among words within Profimedia dataset
•Constructed by Institute of Formal and Applied Linguistics (MFF UK)
7

WordNet vs. co-occurence
•WordNet – fundamental technology
•Meanings, relations, multiple word types
•Hypernymy, antonymy, part-whole, gloss…
•“language” point of view
•
•Co-occurrence table of related words
•Constructed from very large text corpora (linguists from MFF UK)
•For each word that occurrs in Profiset descriptions, we have 100 most co-occurred words
•No word types attached
•“human/database” point of view
8

Our approach I. Overall View
C:\Users\HonzaBotorek\Desktop\overview.png
9

Our approach II.
Network example
•distribute probabilities – PageRank style
http://www.bloguismo.com/wp-content/uploads/2010/04/pagerank.jpg
10

Our approach III.
The probability-transfer network
•The probability-transfer coefficients of links between individual nodes are defined for different
types of relations: hypernymy, synonymy, meronymy, word co-occurrence, …
•E.g. Meronyms (whole -> parts):  (1-l)/n
•
•
C:\Users\HonzaBotorek\Desktop\f.png
l: calibration constant
11

Our approach IV.
Algorithm steps
•1) Assign probability values to initial nodes
•2) Build the network
•Extend initial nodes by related synsets AND co-occurred words
•Assign “probability-transfer coefficients” to links between nodes (determined by the type of
relationship)
•3) “Page-ranking” process
•Run a process where synsets will mutually boost one another’s probability values
•4) Select the most probable synsets
12

Our approach V: Network example
13
D:\ŠKOLA\work\ImageCLEF2014\!!!Prezentace\particular instance of the network.png

Our approach VI.
Unresolved issues
•Calibration of probability-transfer coefficients
•What constants should be used?
•Initial step: assignment of initial probabilities for particular annotation words
•Details of the probability transfer algorithm
•Final step: Selecting of the most probable concepts
•
•
14

Summary
•Problem
•Select descriptive words of given image from a predefined set of concept words
•Our approach
•Construction of a network of synsets; a node (synset) influences another’s probability by mutual
relations
•Inspired by page-ranking algorithm
•Main research objectives
•Design and construct a model of the synset network
•Define and calibrate relations (links) among nodes in the network (synsets)
15