Seminar 6 Definition 1 (Recall) Recall describes how many of the relevant documents are retrieved. π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ = 𝑅 = #π‘Ÿπ‘’π‘™π‘’π‘£π‘Žπ‘›π‘‘ π‘Ÿπ‘’π‘‘π‘Ÿπ‘–π‘’π‘£π‘’π‘‘ #π‘Ÿπ‘’π‘™π‘’π‘£π‘Žπ‘›π‘‘ Definition 2 (Precision) Precision describes how many of the retrieved documents are relevant. π‘π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› = 𝑃 = #π‘Ÿπ‘’π‘™π‘’π‘£π‘Žπ‘›π‘‘ π‘Ÿπ‘’π‘‘π‘Ÿπ‘–π‘’π‘£π‘’π‘‘ #π‘Ÿπ‘’π‘‘π‘Ÿπ‘–π‘’π‘£π‘’π‘‘ Definition 3 (𝐹 -measure) A balanced 𝐹-measure (𝐹1-measure) defines a recall-precision relationship represented by their weighted harmonic mean: 𝐹 = 2 Β· 𝑅 Β· 𝑃 𝑅 + 𝑃 Definition 4 (Mean Average Precision) MAP expresses the precision in each point a new relevant document is included in the result. Is counted as 𝑀 𝐴𝑃(𝑄) = 1 |𝑄| Β· βŽ› ⎝ βˆ‘οΈ π‘žβˆˆπ‘„ 1 π‘Ÿπ‘’π‘™ π‘ž Β· βŽ› ⎝ π‘Ÿπ‘’π‘™ π‘ž βˆ‘οΈ 𝑖=1 π‘π‘Ÿπ‘’π‘π‘– ⎞ ⎠ ⎞ ⎠ where π‘Ÿπ‘’π‘™ π‘ž is the number of relevant documents retrieved by query π‘ž and π‘π‘Ÿπ‘’π‘π‘– is the precision at the 𝑖-th document. Definition 5 (πœ… statistic) Let 𝑁 be the total number of documents, 𝐽 is a set of judges and 𝑃(𝐴) = #π‘Žπ‘”π‘Ÿπ‘’π‘’ 𝑁 the number of documents on which the judges agree. Let also define 𝑅 𝑗 and 𝑁 𝑅 𝑗 be the number of relevant and non-relevant documents, respectively, according to the judge 𝑗 ∈ 𝐽 and 𝑃(𝑅) = βˆ‘οΈ€ π‘—βˆˆπ½ 𝑅 𝑗 |𝐽| Β· 𝑁 and 𝑃(𝑁 𝑅) = βˆ‘οΈ€ π‘—βˆˆπ½ 𝑁 𝑅 𝑗 |𝐽| Β· 𝑁 as the number of relevant and non-relevant documents, respectively. Let finally define 𝑃(𝐸) = 𝑃(𝑅)2 + 𝑃(𝑁 𝑅)2 as the approximate number of disagreements between the judges. Then the πœ… statistic is defined as the measure of agreement between the judges πœ… = 𝑃(𝐴) βˆ’ 𝑃(𝐸) 1 βˆ’ 𝑃(𝐸) . Definition 6 (Rocchio relevance feedback) Rocchio relevance feedback has the form π‘ž π‘š = π›Όπ‘ž0 + 𝛽 1 |𝐷 π‘Ÿ| βˆ‘οΈ ⃗𝑑 π‘Ÿβˆˆπ· π‘Ÿ ⃗𝑑 π‘Ÿ βˆ’ 𝛾 1 |𝐷 π‘›π‘Ÿ| βˆ‘οΈ ⃗𝑑 π‘›π‘Ÿβˆˆπ· π‘›π‘Ÿ ⃗𝑑 π‘›π‘Ÿ where π‘ž0 is the original query vector, 𝐷 π‘Ÿ is the set of relevant documents, 𝐷 π‘›π‘Ÿ is the set of non-relevant documents and the values 𝛼, 𝛽, 𝛾 depend on the system setting. 1 Exercise 1 The following ordered list of 20 letters R and N represents relevant (R) and non-relevant (N) retrieved documents as an answer for a query on a collection of 10 000 documents. The leftmost document is expected to be the most relevant. The list contains 6 relevant documents. Assume that the collection contains 8 documents relevant to the query. R R N N N N N N R N R N N N R N N N N R a) What is the precision on the first 20 results? b) What is the 𝐹-measure on the first 20 results? c) What is the non-interpolated precision of the system at 25% recall? (R=25%) d) What is the interpolated precision of the system at 33% recall? (R>33%) e) Assume that these 20 documents is the complete list of retrieved documents. What is the MAP of the system? Now assume that the system returned all 10,000 documents in an ordered list and above is the top 20. f) What is the highest possible MAP the system can achieve? g) What is the lowest possible MAP the system can achieve? Exercise 2 Below is a table showing how two judges judged the relevance (0 = non-relevant, 1 = relevant) of the set of 12 documents with respect to a query. Assume that you developed an IR system, that for this query returns the documents {4, 5, 6, 7, 8}. Doc ID Judge 1 Judge 2 1 0 0 2 0 0 3 1 1 4 1 1 5 1 0 6 1 0 7 1 0 8 1 0 9 0 1 10 0 1 11 0 1 12 0 1 Table 1: Judges judging the relevance of documents. 2 a) Calculate the πœ… statistic. b) Calculate the recall, precision and 𝐹-measure of your system in which a document is considered relevant if the judges agree. c) Calculate the recall, precision and 𝐹-measure of your system in which a document is considered relevant if at least one of the judges thinks so. Exercise 3 A user’s primary query is cheap CDs cheap DVDs extremely cheap CDs. The user has a look on two documents: doc1 a doc2, marking doc1 CDs cheap software cheap CDs as relevant and doc2 cheap thrills DVDs as non-relevant. Assume that we use a simple tf scheme without vector length normalization. What would be the restructured query vector after considering the Rocchio relevance feedback with values 𝛼 = 1, 𝛽 = 0.75 and 𝛾 = 0.25? 3