IIS4AVg=/ Relevance feedback + Text classification (Chapter 9+13) Definition 1 (Rocchlo relevance feedback) Rocchio relevance feedback has the form initial query /' = ago + ß where qa is the original query vector, Dr is the set of relevant documents, Dnr is the set of non-relevant documents and the values a, 0, 7 depend on the system setting. Exercise 9/1 What is the main purpose of Rocchio relevance feedback? Revised query x known non-relevant documents Exercise 9/2 A user's primary query i^ehcap CDs cheap DVDs extremely chcapCD&)Thc user has ;i In ok on I wn don^nonls; doc I a dor'2. rn,r'k in;1; doe: I ("■/"/,■ cheap c.iicap CDs ay relevant iuji; doc'2 'leap /.h'-ill'i D\ D.i at; 11011-re levant. Assume that we use a simple tf scheme, without length tio"iri;ili/.;i.tiori. What wniild! hp. I.hp restructured query vector- after e it.'l:.;vaii:;c Ice; 11;ark with values a = 1, 0 = 0.75, ar.d 7 = 0.25? Text classification and Naive Bayes (Chapter 13) Definition 2 (Naive Bayes Classifier) Naive Bayes (NB) assumes Iht effect of the value of a predictor x on a given class c is class co-ndilio-nal Bayes t.h.eorem. providts a way of calculating Iht. ■posterior probability P[<: .r) from class prior proim.irhi.tij P'c). predictor prior probability P{x) and probability of the predictor given the class P(x\c) Pics.) PiAc)P{c) P{x) and for a vector of predictors X = P(c\X) = P(x1\c)...P(xn\c)P(c) P(Xl)...P(xn) The class with the highest posterior probability is the outcome of prediction. Wc rewrite the to the tabic for a rčíc van t non-relevant terms docl doc2 query CDs 2 0 2 cheap 2 1 3 software 1 0 0 thrills 0 1 0 DVDs 0 1 1 ■; ;xl. re in ch- 0 0 1