Seminar 2 Algorithm 1 (Soundex Code) Transformation of a string to a 4-character soundex code 1. Keep the first character 2. Rewrite {𝐴, 𝐸, 𝐼, 𝑂, 𝑈, 𝐻, 𝑊, 𝑌 } to 0 3. Rewrite characters (a) {𝐵, 𝐹, 𝑃, 𝑉 } to 1 (b) {𝐶, 𝐺, 𝐽, 𝐾, 𝑄, 𝑆, 𝑋, 𝑍} to 2 (c) {𝐷, 𝑇} to 3 (d) {𝐿} to 4 (e) {𝑀, 𝑁} to 5 (f) {𝑅} to 6 4. Remove duplicities 5. Remove zeros 6. Change to length 4 (truncate or add zeros) Algorithm 2 (Querying in Permuterm Index) For query 𝑞, find keys according to the following scheme: ∙ for 𝑞 = 𝑋, find keys in the form 𝑋$ ∙ for 𝑞 = 𝑋* , find keys in the form $𝑋* ∙ for 𝑞 = * 𝑋, find keys in the form 𝑋$* ∙ for 𝑞 = * 𝑋* , find keys in the form 𝑋* ∙ for 𝑞 = 𝑋* 𝑌 , find keys in the form 𝑌 $𝑋* Exercise 1 Below is a part of index with positions in the form doc1: ⟨𝑝𝑜𝑠1, 𝑝𝑜𝑠2, 𝑝𝑜𝑠3, . . .⟩; doc2: ⟨𝑝𝑜𝑠1, 𝑝𝑜𝑠2, . . .⟩; . . . ∙ angels: 2 : ⟨36, 174, 252, 651⟩; 4 : ⟨12, 22, 102, 432⟩; 7 : ⟨17⟩; ∙ fools: 2 : ⟨1, 17, 74, 222⟩; 4 : ⟨8, 78, 108, 458⟩; 7 : ⟨3, 13, 23, 193⟩; ∙ fear: 2 : ⟨87, 704, 722, 901⟩; 4 : ⟨13, 43, 113, 433⟩; 7 : ⟨18, 328, 528⟩; ∙ in: 2 : ⟨3, 37, 76, 444, 851⟩; 4 : ⟨10, 20, 110, 470, 500⟩; 7 : ⟨5, 15, 25, 195⟩; ∙ rush: 2 : ⟨2, 66, 194, 321, 702⟩; 4 : ⟨9, 69, 149, 429, 569⟩; 7 : ⟨4, 14, 404⟩; ∙ to: 2 : ⟨47, 86, 234, 999⟩; 4 : ⟨14, 24, 774, 944⟩; 7 : ⟨19, 319, 599, 709⟩; ∙ tread: 2 : ⟨57, 94, 333⟩; 4 : ⟨15, 35, 155⟩; 7 : ⟨20, 320⟩; 1 ∙ where: 2 : ⟨67, 124, 393, 1001⟩; 4 : ⟨11, 41, 101, 421, 431⟩; 7 : ⟨15, 35, 735⟩; The following terms are phrase queries. Which documents correspond to the following queries and on which positions? a) fools rush in b) fools rush in AND angels fear to tread. The index is incorrect. How? Exercise 2 Consider a query composed of two terms. Non-positional postings list of one term is composed of 16 items 𝑃 = [4, 6, 10, 12, 14, 16, 18, 20, 22, 32, 47, 81, 120, 215, 300, 500] and the second term has the postings list of only a single element 𝑅 = [47]. Find out how many comparisons (and why) are necessary to find out the intersection of the lists that are organized as follows: a) standard postings lists b) postings lists with skip pointers of skip frequency √︀ |𝑃| Exercise 3 a) Find two different words of the same soundex code. b) Find two phonetically similar words of different soundex codes. Exercise 4 Write elements in a dictionary of the permuterm index generated by the term mama. Exercise 5 Which keys are usable for finding the term s*ng in a permuterm wildcard index? 2