Relevance feedback and query expansion (Chapter 9) Definition 1 (Rocchio relevance feedback) Rocchio relevance feedback has the form π‘ž π‘š = π›Όπ‘ž0 + 𝛽 1 |𝐷 π‘Ÿ| βˆ‘οΈ ⃗𝑑 π‘Ÿβˆˆπ· π‘Ÿ ⃗𝑑 π‘Ÿ βˆ’ 𝛾 1 |𝐷 π‘›π‘Ÿ| βˆ‘οΈ ⃗𝑑 π‘›π‘Ÿβˆˆπ· π‘›π‘Ÿ ⃗𝑑 π‘›π‘Ÿ where π‘ž0 is the original query vector, 𝐷 π‘Ÿ is the set of relevant documents, 𝐷 π‘›π‘Ÿ is the set of non-relevant documents and the values 𝛼, 𝛽, 𝛾 depend on the system setting. Exercise 9/1 What is the main purpose of Rocchio relevance feedback? Exercise 9/2 A user’s primary query is cheap CDs cheap DVDs extremely cheap CDs. The user has a look on two documents: doc1 a doc2, marking doc1 CDs cheap software cheap CDs as relevant and doc2 cheap thrills DVDs as non-relevant. Assume that we use a simple tf scheme without vector length normalization. What would be the restructured query vector after considering the Rocchio relevance feedback with values 𝛼 = 1, 𝛽 = 0.75, and 𝛾 = 0.25? Text classification and Naive Bayes (Chapter 13) Definition 2 (Naive Bayes Classifier) Naive Bayes (NB) Classifier assumes that the effect of the value of a predictor π‘₯ on a given class 𝑐 is class conditional independent. Bayes theorem provides a way of calculating the posterior probability 𝑃(𝑐|π‘₯) from class prior probability 𝑃(𝑐), predictor prior probability 𝑃(π‘₯) and probability of the predictor given the class 𝑃(π‘₯|𝑐) 𝑃(𝑐|π‘₯) = 𝑃(π‘₯|𝑐)𝑃(𝑐) 𝑃(π‘₯) and for a vector of predictors 𝑋 = (π‘₯1, . . . , π‘₯ 𝑛) 𝑃(𝑐|𝑋) = 𝑃(π‘₯1|𝑐) . . . 𝑃(π‘₯ 𝑛|𝑐)𝑃(𝑐) 𝑃(π‘₯1) . . . 𝑃(π‘₯ 𝑛) . The class with the highest posterior probability is the outcome of prediction. Exercise 13/1 What is naive about Naive Bayes classifier? Briefly outline its major idea. 1 Exercise 13/2 Considering the table of observations, use the Naive Bayes classifier to recommend whether to Play Golf given a day with Outlook = Rainy, Temperature = Mild, Humidity = Normal and Windy = True. Do not deal with the zero-frequency problem. Outlook Temperature Humidity Windy Play Golf Rainy Hot High False No Rainy Hot High True No Overcast Hot High False Yes Sunny Mild High False Yes Sunny Cool Normal False Yes Sunny Cool Normal True No Overcast Cool Normal True Yes Rainy Mild High False No Rainy Cool Normal False Yes Sunny Mild Normal False Yes Rainy Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Sunny Mild High True No Table 1: Exercise. Definition 3 (A Linear Classifier) Our linear classifier finds the hyperplane that bisects and is perpendicular to the connecting line of the closest points from the two classes. The separating (decision) hyperplane is defined in terms of a normal (weight) vector w and a scalar intercept term 𝑏 as 𝑓(π‘₯) = w Β· x + 𝑏 where Β· is the dot product of vectors. Finally, the classifier becomes π‘π‘™π‘Žπ‘ π‘ (π‘₯) = 𝑠𝑔𝑛(𝑓(π‘₯)). Exercise 13/3 Draw a sketch explaining the concept of our linear classifier. Include the equation of the separation hyperplane. Is our classifier equivalent to support vector machines (SVM)? What are limitations of our classifier? Exercise 13/4 Build a linear classifier for the training set {([1, 1], βˆ’1), ([2, 0], βˆ’1), ([2, 3], +1)}. 2 Exercise 13/5 Explain the concept of classification based on neural networks. Draw a sketch and comment on all components. Exercise 13/6 What is the difference between supervised and unsupervised learning? Give examples. 3