Seminar 12 Exercise 1 Pick a topic of your interest and describe it by 5-10 words. Open Sketch Engine https://ske.fi.muni.cz/. Go to WebBootCaT. Create a corpus using the description words as seed. Wait until data are downloaded. Search the word corpus for collocations. Definition 1 (Index Relations) Suppose that we could pick a random page from the index of 𝐸1 and test whether it is in 𝐸2’s index and symmetrically, test whether a random page from 𝐸2 is in 𝐸1. These experiments give us fractions π‘₯ and 𝑦 such that our estimate is that a fraction π‘₯ of the pages in 𝐸1 are in 𝐸2, while a fraction 𝑦 of the pages in 𝐸2 are in 𝐸1. Then, letting |𝐸𝑖| denote the size of the index of search engine 𝐸𝑖, we have π‘₯|𝐸1| β‰ˆ 𝑦|𝐸2|, from which we have the form we will use |𝐸1| |𝐸2| β‰ˆ 𝑦 π‘₯ . Exercise 2 Two web search engines 𝐴 and 𝐡 each generate a large number of pages uniformly at random from their indexes. 30% of 𝐴’s pages are present in 𝐡’s index, while 50% of 𝐡’s pages are present in 𝐴’s index. What is the number of pages in 𝐴’s index relative to 𝐡’s? Definition 2 (Path Similarity) Similarity between a query XPath 𝑐 π‘ž and a document path 𝑐 𝑑 is calculated as 𝐢𝑅(𝑐 π‘ž, 𝑐 𝑑) = {οΈƒ 1+|𝑐 π‘ž| 1+|𝑐 𝑑| if 𝑐 π‘ž can be expanded to 𝑐 𝑑 by adding nodes to the path 0 otherwise Definition 3 (Structural Term) Structural term is defined as an XML-context/term pair denoted by of existing path to a value and the value itself, where the value itself is also a node in the XML document. For example, an XML document containing only a root element with test. test contains two structural terms and . 1 Exercise 3 Computer Science Jennifer Widom Programming Methodology Introduction to the engineering of computer applications emphasizing modern software engineering principles. Jerry R. Cain Eric Roberts Mehran Sahami Programming Abstractions Abstraction and its relation to programming. Eric Roberts Jerry R. Cain CS106A 2 1. Write the following expressions for the XML document above: a) Return all titles (including both departments and courses). b) Return all course titles that contain the word programming. c) Return the surnames of all instructors teaching at least one course that contains the word software in its description. d) Return the surnames of all professors teaching at least one course that contains the word software in its description. 2. Calculate the similarity between the queries and the corresponding document paths. a) //Instructors//Last_Name#Cain b) //Course/Instructors/Lecturer/Last_Name#Cain Exercise 4 Count how many structural terms are present in the XML tree: Programming Abstractions Abstraction and its relation to programming Eric Roberts 3