Relevance Feedback Survey
Jan Botorek 4.11.2014
Outline of the presentation
 1) MUFIN AnnotationTool
 Current situation, future direction
 2) Relevance Feedback
 General overview
 Approaches
 3) RF for MUFIN AnnotationTool
 Specific requirements
 RFTool design
1
MUFIN Annotation Tool I.
 Main goal = automatically annotate unknown images with
relevant descriptive words
2
MUFIN Annotation Tool II.
3
Relevance Feedback I. (Motivation)
 Incorporate a human factor to the annotation process
 „Iterative evaluation of particular system behavior conducted
by system users“
4
Relevance Feedback II.
 Relevance Feedback idea:
 take results that are returned from a given
query processing
 provide results to users for relevance
evaluation
 utilize user-provided information about
whether or not those results are relevant to
perform a new (improved) query
5
Relevance Feedback III.
 First application: documents search
 Relevance evaluation of retrieved documents
 Most of RF studies aimed to text-based application
 Developed into a widely used technology
 Text retrieval – social networks
 CBIR – evaluation of visually similar images
6
Relevance Feedback IV. (Basic Terminology)
 Query object (Qo):
 The „original“
 The object that the retrieval process is based on
 “Query subjects are meant to be as similar as possible”
 Query subjects(Qs):
 Subjects of user‟s relevance evaluation
 Iteration (I):
 An evaluation run performed by user of the RF system
 Evaluation (E):
 Relevance value assignment by user to particular query result
7
Relevance Feedback V. - Approaches
 Boolean model
 The simplest one – based on the strict match of query/documents
 Vector-space model
 A document is represented by a vector
 Model is based on vector operations in particular vector space
 Probability model
 A document is also represented by a vector BUT the vector space
is replaced by a probability function
 Logic (language) model
 Utilizes logic interference in conjunction with some knowledge
source (e.g. ontology)
8
Vector-space model I.
 Selected as a base to our further consideration
 Relatively simple; widely used approach to RF
 Vector space  allows vector operations
 Document (object) needs some vector representation
 Defined byTf-idf values of words within document space
 Each vector has particular dimension
 Each element of a vector represents a tf-idf value of a
particular word from within a set of all words of a particular
database
 Distance may be measured among documents (vectors)
 E.g. Cosine distance
9
Vector-space model II.
 tf-idf
 tf(t,d) - (Term Frequency) = integer number expressing a
frequency of a term t in a document d
 idf(t) (Inverse Document Frequency) =
 N = number of all documents in the collection
 df = number of documents containing term t
 log = more frequent terms have lower value then
less frequent ones
 tf-idf(t,d) = tf(t,d) . idf(t)
10
Vector-space model III. (Similarity measure)
 Cosine similarity
 The most fundamental approach to measure similarity of two
vectors
 Nominator:“dot product” / Inner product
 Denominator: Euclidian distance
 Normalization of vector lengths
11
Vector-space model IV. (Rocchio)
 Rocchio‟s formula = baseline of theVector model approach
 Related/unrelated documents
 Constants a, b, c influence the
importance of particular equation
component (original, positive, negative)
12
Vector-space model V. (Rocchio Example)
13
RF for annotations I.
 Text object RF vector:
 If in document collection C(d),
there are only 6 words (w1.. w6) repeating,
 then Vd = (10,0,3,1,7,0)  document d contains
10x w1; 3x w3; Ix w4 and 7x w5
 Document is composed of text pieces = words are repeated
 Evaluation of documents as a whole
 The evaluator evaluates same objects (documents) as
he/she searches for (document)
15
RF for annotations II.
 Image object RF vector:
 If in image collection C(i),
there are only 6 words (w1 .. w6) repeating,
 thenVi = (1,0,1,1,1,0)  image c annotation
consists of words w1,w3,w4,w5
 Image description is (typically) composed from
separate keywords; not repeating
 Photos can be considered as a SHORT text document
 Composed of only keywords
 Evaluation of textual descriptions of image
 Evaluators evaluate different objects (keywords) than
he/she searches for (images)
16
RF for annotations III. (Our situation)
 Our situation:Visual query(image) + textual description
 I. Iteration = image + (user-provided) optionally textual
description
 II. Iteration = image + (RF-based) textual description
 III. Iteration = image + (RF-based) improved textual
description
 …
 RF is utilized ONLY in the textual part of the
query
17
RF for annotations IV. (Our demands)
 Include also negative RF
 Short description  only positive evaluation probably is not
sufficient; it is desired to handicap the non-relevant concepts
 Better scalability of evaluation
 Incorporate more levels of evaluation
 Ability to emphasize the positivness or negativness of
particular word
 So far (Rocchio) only 2 scale levels = we require more general
approach
18
Proposed Image Annotation RF Approach I.
(reminder)
19
Proposed Image Annotation RF Approach II.
20
Proposed Image Annotation RF Approach III.
 RF: First iteration:
 Initial vector Q0 is empty  former query was not evaluated
 User evaluates words  relevant (Qr) & non-relevant (Qnr) query
vectors are constructed directly
 1) Animal: 0,5
 2) Dog:1
 3) Plant: -1
 Qr : (0.5, 1, 0)
 Qnr: (0,0,1)
21
Proposed Image Annotation RF Approach IV.
 RF: Subsequent iterations:
 Initial vector Q0 is NOT empty  former queries were
evaluated
 Qr : (0.5, 1, 0) – (animal, dog, plant)
 Qnr: (0,0,1) – (animal, dog, plant)
 Qr„ and Qnr„ are constructed as follows:
 If new word occurs, is added into Q‟
 If already presented word is evaluated  average value is constructed
 1) Animal: 1
 2) Plant: 0,5
 3) Poodle: 1
 Qr„= (0.75, 1, 0, 1) Qnr„=(0, 0, 0.25, 0)
 (animal, dog, plant, poodle)
22
Proposed Image Annotation RF Approach V.
23
Proposed Image Annotation RF Approach VI.
 Text Ranking:
 Visually similar images are transformed into word vectors
 visually similar images LIMITS the scope of text ranking
 Cosine similarity is computed between the query vector and
similar image vectors
 Both for relevant and non-relevant initial vectors
 According to the similarity values similar images are ranked
into two lists: by relevance and by non-relevance
 Output of the Text Ranking component is formed by
combination of constructed two ranked lists
23
Proposed Image Annotation RF Approach VII.
24