Learning to rank (Chapter 15)
Definition 1 (Learning to rank IR System)
Let’s have a set of documents 𝐷1..|𝐷|, a set of queries 𝑄1..|𝑄| and a set of relevance
judgements 𝐽1..|𝐽|(𝑄 𝑗, 𝐷𝑖), where
𝐽(𝑄 𝑗, 𝐷𝑖) =
{︃
1, if a document 𝐷𝑖 is relevant for a query 𝑄 𝑗
0, if a document 𝐷𝑖 is irrelevant for a query 𝑄 𝑗
(1)
the objective of a learning-to-rank IR System is to (one of the following):
A) find a function 𝑓(𝑄 𝑗, 𝐷𝑖) with the property:
∀𝐽(𝑄 𝑗, 𝐷rel) = 1, ∀𝐽(𝑄 𝑗, 𝐷irrel) = 0 : 𝑓(𝑄 𝑗, 𝐷rel) > 𝑓(𝑄 𝑗, 𝐷irrel) (2)
B) find functions 𝑓 𝑞(𝑄 𝑗), 𝑓 𝑑(𝐷 𝐼), 𝑓(𝑄emb, 𝐷emb) with the properties:
∀𝐽(𝑄 𝑗, 𝐷rel) = 1, ∀𝐽(𝑄 𝑗, 𝐷irrel) = 0 :
𝑓(𝑓 𝑞(𝑄 𝑗), 𝑓 𝑑(𝐷rel)) > 𝑓(𝑓 𝑞(𝑄 𝑗), 𝑓 𝑑(𝐷irrel)) (3)
Exercise 15/1
Consider a collection of queries, documents, and judgements:
Query 1: president public speaking
Query 2: presidential elections
Doc 1: Obama speaks in Chicago
Doc 2: President has spoken this morning
Doc 3: A new president was elected
Judgement 1: J(Query 1, Doc 1) = 1
Judgement 1: J(Query 1, Doc 2) = 1
Judgement 2: J(Query 1, Doc 3) = 0
Judgement 3: J(Query 2, Doc 3) = 1
With respect to Occam’s razor principle, come up with a function 𝑓(𝑄 𝑗, 𝐷𝑖) that is
consistent with this data set.
Exercise 15/2
What if we change the Document 2 from previous exercise to
Doc 2: President greeted press this morning
1
Exercise 15/3
As the dataset grows bigger, there is a good chance that we won’t come up with a
function 𝑓(𝑄 𝑗, 𝐷𝑖) that will fit the dataset perfectly. How can we evaluate how well the
function fits the dataset? Is it fair to evaluate this on a dataset from which the function
has been inferred?
Exercise 15/4
If the quality of 𝑓(𝑄 𝑗, 𝐷𝑖) can be automatically evaluated, can we create an algorithm
that will find an optimal 𝑓 for us?
Given a fixed representation of queries and documents to be a bag of words, how can
we find a 𝑓(𝑄 𝑗, 𝐷𝑖) that assigns the weights to each of the words in the representation
so that the condition in the Definition 1, objective A) holds?
Discuss your ideas.
Exercise 15/5
On the other hand, if we fix the 𝑓(𝑄 𝑗, 𝐷𝑖) in the Definition 1, objective B), can how can
we find the optimal representation, or embeddings of query 𝑓 𝑞(𝑄 𝑗) and document 𝑓 𝑑(𝐷 𝐼)
so that, the condition in Definition 1, objective B) holds?
For example, consider a case where 𝑓(𝑄emb, 𝐷emb) = cos(𝑓 𝑞(𝑄 𝑗), 𝑓 𝑑(𝐷 𝐼)).
Discuss your ideas.
Exercise 15/6
What are the advantages of using the approach of a more complex objective B) (embeddingsfirst),
as compared to objective A) (directly ranking all query-document pairs)? Discuss.
2