r. ,n r. ,n r, ,"i rr n r, n r. .n r, ,n r. ,i r. ,n r, ,n S/ieCÍ L J L I UČO L- J L J L J L J L J L* J L J pOÍtltS L Describe the SoundEx phonetic retrieval algorithm [2 points]. Give an example of two similarly-sounding words with the same SoundEx code [1 point]. Explain the weak point of SoundEx [1 point], and give an example of two similarly-sounding words with different SoundEx codes [1 point]. ~Sowid.\in^ WiorAz tpcjMtr rev^ivi^ vowels ow\A u/^h sii^iUt relish jpftllinj kogttket. 5oc/orA amA shoxk h& . The toy'Y We<\k ^oxvxt 0\ khe Sowtóx ^0Y\iV^ IS Me -fog ( f200) o^A tV»kg ( T2<2><2) Write your solution only on this side of the sheet! sheet E uco points Consider the following collection of four documents df. • d\: BREAKTHROUGH DRUG FOR HIV • NEW HIV DRUG • dy. NEW APPROACH FOR TREATMENT OF HIV • d±: NEW HOPES FOR HIV PATIENTS Produce a list of (term, document ID) tuples [1 point], sort this list in lexicographical order [1 point], and use the sorted list to construct an inverted index [1 point]. Write down each step. Describe how you would produce this index using the MapReduce distributed framework [2 points]. ( taeikfchrtujh j), ( drug l ^, ( \ox, l) , ( K\v, l) ( Uyfr**^, 3>, ( bi*AkMirai^b('Ol (Mr^,l)) ()l(v\twtH){ (o\t$)} (patients, *0( f or -> 1 -> 3 -f H vie —? z-i 3 -f 4 0^ 3 E«cla y«vfSeir woulA process dociAWBVlt ip) list. r^vge »| terms fro face Write your solution only on this side of the sheet! sheet 3 ,~i r, ,~i r, ,~i ,~i r, ,n r, ,-i uco points Define the two assumptions the Naive Bayes classifier makes [2 points]. Explain the advantage of computing a product of probability estimates as a sum in the logarithmic space [1 point]. Given an observation x, and the classes C\, and c-i, is the knowledge of P(x | C\)V{c\) > P(x | C2)P(c2) sufficient to decide whether P(c\ | x) > P(c2 | x)? Why or why not? [2 points] Given the following list of observations, use the Naive Bayes classifier to decide whether to play golf when it is sunny hot, windy and the humidity is normal. [5 points] Outlook Temperature Humidity Windy Play golf Sunny Mild High False Yes Rainy Mild Normal True Yes Overcast Cool Normal True Yes Sunny Mild High True No Sunny Mild Normal False Yes Rainy Cool Normal False Yes Overcast Hot High False Yes Rainy Hot High False No Overcast Mild High True Yes Overcast Hot Normal False Yes Sunny Cool Normal True No Rainy Hot High True No Sunny Cool Normal False Yes Rainy Mild High False No fcrfWUjc: sUtohtij of v^ulUpl^i^ swiäU rcM numbers, I V A / /ice Versa.. V/t- l / ***** Z 6 3 3 6 •? True Wés) = • —• V —• — = —t~ ? (yes I Sm«hj J tU, KWmaI ("W ) > P C Nto I Shuhj , Hot, , Tme ) • Write your solution only on this side of the sheet! sheet c j H ,1 rr ,n r, T-| r, ,n r. ,1 r, ,n r. ,n points Given a directed graph G that represents three Web pages V(G) = {a,b,c}, and the links E(G) = {(fr,fl), (c,a), (c,b), (b,c)} between these three pages, draw G [1 point] and produce the adjacency matrix (also known as the link matrix) A [1 point], and the Markov transition matrix P [2 points]. Describe the intuition behind the PageRank algorithm [1 point]. Compute the PageRank of the pages a, b, and c using a single iteration of the PageRank algorithm [2 points]. Describe what we mean, when we call a page a fowfr, or an authority [1 point]. Compute the hub, and authority scores of the pages a, b, and c [2 points]. F- * 6 0 0 1 0 1 1 1 0 " 1 1 1 'Ml 113 1)3 1 0 1 0 Ml 112 11* _ 1 1 0 . i JM7. 112 1/2 < The ?age1U*vk *W)*rMfc\ c^Mf^es the fr*UWiil^j 1 VHjpatlrietu*\ flVlAtUt'»tvj s)5 a. \K/et |>^e Waa/fc VHAHjj Uv\>5 ^ink to . "& 0 * 7 f 0 1 o r & 0 1 0 4 l-l 0 0 4 I- 0 Z .1 1 0JL0 1 0^ L * 1 t A. r 0 1 11 ra ^ 01 r A A - 0 0 /i • 4 0 1 l- a, A-AT - 0 1 2 2. 1 1 110 18 1 Write your solution only on this side of the sheet!