Seminar 6
Definition 1 (Recall) Recall describes how many of the relevant documents are retrieved.
𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑅 =
#𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
#𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡
Definition 2 (Precision) Precision describes how many of the retrieved documents are
relevant.
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑃 =
#𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
#𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
Definition 3 (𝐹 -measure) A balanced 𝐹-measure (𝐹1-measure) defines a recall-precision
relationship represented by their weighted harmonic mean:
𝐹 =
2 · 𝑅 · 𝑃
𝑅 + 𝑃
Definition 4 (Mean Average Precision) MAP expresses the precision in each point a
new relevant document is included in the result. Is counted as
𝑀 𝐴𝑃(𝑄) =
1
|𝑄|
·
⎛
⎝
∑︁
𝑞∈𝑄
1
𝑟𝑒𝑙 𝑞
·
⎛
⎝
𝑟𝑒𝑙 𝑞
∑︁
𝑖=1
𝑝𝑟𝑒𝑐𝑖
⎞
⎠
⎞
⎠
where 𝑟𝑒𝑙 𝑞 is the number of relevant documents retrieved by query 𝑞 and 𝑝𝑟𝑒𝑐𝑖 is the precision
at the 𝑖-th document.
Definition 5 (𝜅 statistic) Let 𝑁 be the total number of documents, 𝐽 is a set of judges
and 𝑃(𝐴) = #𝑎𝑔𝑟𝑒𝑒
𝑁 the number of documents on which the judges agree. Let also define 𝑅 𝑗
and 𝑁 𝑅 𝑗 be the number of relevant and non-relevant documents, respectively, according to
the judge 𝑗 ∈ 𝐽 and
𝑃(𝑅) =
∑︀
𝑗∈𝐽 𝑅 𝑗
|𝐽| · 𝑁
and 𝑃(𝑁 𝑅) =
∑︀
𝑗∈𝐽 𝑁 𝑅 𝑗
|𝐽| · 𝑁
as the number of relevant and non-relevant documents, respectively. Let finally define
𝑃(𝐸) = 𝑃(𝑅)2
+ 𝑃(𝑁 𝑅)2
as the approximate number of disagreements between the judges. Then the 𝜅 statistic is
defined as the measure of agreement between the judges
𝜅 =
𝑃(𝐴) − 𝑃(𝐸)
1 − 𝑃(𝐸)
.
Definition 6 (Rocchio relevance feedback) Rocchio relevance feedback has the form
𝑞 𝑚 = 𝛼𝑞0 + 𝛽
1
|𝐷 𝑟|
∑︁
⃗𝑑 𝑟∈𝐷 𝑟
⃗𝑑 𝑟 − 𝛾
1
|𝐷 𝑛𝑟|
∑︁
⃗𝑑 𝑛𝑟∈𝐷 𝑛𝑟
⃗𝑑 𝑛𝑟
where 𝑞0 is the original query vector, 𝐷 𝑟 is the set of relevant documents, 𝐷 𝑛𝑟 is the set of
non-relevant documents and the values 𝛼, 𝛽, 𝛾 depend on the system setting.
1
Exercise 1
The following ordered list of 20 letters R and N represents relevant (R) and non-relevant
(N) retrieved documents as an answer for a query on a collection of 10 000 documents.
The leftmost document is expected to be the most relevant. The list contains 6 relevant
documents. Assume that the collection contains 8 documents relevant to the query.
R R N N N N N N R N R N N N R N N N N R
a) What is the precision on the first 20 results?
b) What is the 𝐹-measure on the first 20 results?
c) What is the non-interpolated precision of the system at 25% recall? (R=25%)
d) What is the interpolated precision of the system at 33% recall? (R>33%)
e) Assume that these 20 documents is the complete list of retrieved documents. What is the
MAP of the system?
Now assume that the system returned all 10,000 documents in an ordered list and above is
the top 20.
f) What is the highest possible MAP the system can achieve?
g) What is the lowest possible MAP the system can achieve?
Exercise 2
Below is a table showing how two judges judged the relevance (0 = non-relevant, 1 = relevant)
of the set of 12 documents with respect to a query. Assume that you developed an IR system,
that for this query returns the documents {4, 5, 6, 7, 8}.
Doc ID Judge 1 Judge 2
1 0 0
2 0 0
3 1 1
4 1 1
5 1 0
6 1 0
7 1 0
8 1 0
9 0 1
10 0 1
11 0 1
12 0 1
Table 1: Judges judging the relevance of documents.
2
a) Calculate the 𝜅 statistic.
b) Calculate the recall, precision and 𝐹-measure of your system in which a document is
considered relevant if the judges agree.
c) Calculate the recall, precision and 𝐹-measure of your system in which a document is
considered relevant if at least one of the judges thinks so.
Exercise 3
A user’s primary query is cheap CDs cheap DVDs extremely cheap CDs. The user has a look
on two documents: doc1 a doc2, marking doc1 CDs cheap software cheap CDs as relevant
and doc2 cheap thrills DVDs as non-relevant. Assume that we use a simple tf scheme without
vector length normalization. What would be the restructured query vector after considering
the Rocchio relevance feedback with values 𝛼 = 1, 𝛽 = 0.75 and 𝛾 = 0.25?
3