Learning to rank (Chapter 15) Definition 1 (Learning to rank IR System) Let’s have a set of documents 𝐷1..|𝐷|, a set of queries 𝑄1..|𝑄| and a set of relevance judgements 𝐽1..|𝐽|(𝑄 𝑗, 𝐷𝑖), where 𝐽(𝑄 𝑗, 𝐷𝑖) = {οΈƒ 1, if a document 𝐷𝑖 is relevant for a query 𝑄 𝑗 0, if a document 𝐷𝑖 is irrelevant for a query 𝑄 𝑗 (1) the objective of a learning-to-rank IR System is to (one of the following): A) find a function 𝑓(𝑄 𝑗, 𝐷𝑖) with the property: βˆ€π½(𝑄 𝑗, 𝐷rel) = 1, βˆ€π½(𝑄 𝑗, 𝐷irrel) = 0 : 𝑓(𝑄 𝑗, 𝐷rel) > 𝑓(𝑄 𝑗, 𝐷irrel) (2) B) find functions 𝑓 π‘ž(𝑄 𝑗), 𝑓 𝑑(𝐷 𝐼), 𝑓(𝑄emb, 𝐷emb) with the properties: βˆ€π½(𝑄 𝑗, 𝐷rel) = 1, βˆ€π½(𝑄 𝑗, 𝐷irrel) = 0 : 𝑓(𝑓 π‘ž(𝑄 𝑗), 𝑓 𝑑(𝐷rel)) > 𝑓(𝑓 π‘ž(𝑄 𝑗), 𝑓 𝑑(𝐷irrel)) (3) Exercise 15/1 Consider a collection of queries, documents, and judgements: Query 1: president public speaking Query 2: presidential elections Doc 1: Obama speaks in Chicago Doc 2: President has spoken this morning Doc 3: A new president was elected Judgement 1: J(Query 1, Doc 1) = 1 Judgement 1: J(Query 1, Doc 2) = 1 Judgement 2: J(Query 1, Doc 3) = 0 Judgement 3: J(Query 2, Doc 3) = 1 With respect to Occam’s razor principle, come up with a function 𝑓(𝑄 𝑗, 𝐷𝑖) that is consistent with this data set. Exercise 15/2 What if we change the Document 2 from previous exercise to Doc 2: President greeted press this morning 1 Exercise 15/3 As the dataset grows bigger, there is a good chance that we won’t come up with a function 𝑓(𝑄 𝑗, 𝐷𝑖) that will fit the dataset perfectly. How can we evaluate how well the function fits the dataset? Is it fair to evaluate this on a dataset from which the function has been inferred? Exercise 15/4 If the quality of 𝑓(𝑄 𝑗, 𝐷𝑖) can be automatically evaluated, can we create an algorithm that will find an optimal 𝑓 for us? Given a fixed representation of queries and documents to be a bag of words, how can we find a 𝑓(𝑄 𝑗, 𝐷𝑖) that assigns the weights to each of the words in the representation so that the condition in the Definition 1, objective A) holds? Discuss your ideas. Exercise 15/5 On the other hand, if we fix the 𝑓(𝑄 𝑗, 𝐷𝑖) in the Definition 1, objective B), can how can we find the optimal representation, or embeddings of query 𝑓 π‘ž(𝑄 𝑗) and document 𝑓 𝑑(𝐷 𝐼) so that, the condition in Definition 1, objective B) holds? For example, consider a case where 𝑓(𝑄emb, 𝐷emb) = cos(𝑓 π‘ž(𝑄 𝑗), 𝑓 𝑑(𝐷 𝐼)). Discuss your ideas. Exercise 15/6 What are the advantages of using the approach of a more complex objective B) (embeddingsfirst), as compared to objective A) (directly ranking all query-document pairs)? Discuss. 2