Seminar 7 Definition 1 (Naive Bayes Classifier) Naive Bayes (NB) Classifier assumes that the effect of the value of a predictor 𝑥 on a given class 𝑐 is class conditional independent. Bayes theorem provides a way of calculating the posterior probability 𝑃(𝑐|𝑥) from class prior probability 𝑃(𝑐), predictor prior probability 𝑃(𝑥) and probability of the predictor given the class 𝑃(𝑥|𝑐) 𝑃(𝑐|𝑥) = 𝑃(𝑥|𝑐)𝑃(𝑐) 𝑃(𝑥) and for a vector of predictors 𝑋 = (𝑥1, . . . , 𝑥 𝑛) 𝑃(𝑐|𝑋) = 𝑃(𝑥1|𝑐) . . . 𝑃(𝑥 𝑛|𝑐)𝑃(𝑐). The class with the highest posterior probability is the outcome of prediction. Exercise 1 Considering the table of observations, use the Naive Bayes classifier to recommend whether to Play Golf given a day with Outlook = Rainy, Temperature = Mild, Humidity = Normal and Windy = True. Do not deal with the zero-frequency problem. Outlook Temperature Humidity Windy Play Golf Rainy Hot High False No Rainy Hot High True No Overcast Hot High False Yes Sunny Mild High False Yes Sunny Cool Normal False Yes Sunny Cool Normal True No Overcast Cool Normal True Yes Rainy Mild High False No Rainy Cool Normal False Yes Sunny Mild Normal False Yes Rainy Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Sunny Mild High True No Table 1: Exercise. 1 First build the likelihood tables for each predictor Play Golf Yes No Outlook Sunny 3/9 2/5 5/14 Overcast 4/9 0/5 4/14 Rainy 2/9 3/5 5/14 9/14 5/14 Play Golf Yes No Temperature Hot 2/9 2/5 4/14 Mild 4/9 2/5 6/14 Cool 3/9 1/5 4/14 9/14 5/14 Play Golf Yes No Humidity High 3/9 4/5 7/14 Normal 6/9 1/5 7/14 9/14 5/14 Play Golf Yes No Windy True 3/9 2/5 5/14 False 6/9 3/5 9/14 9/14 5/14 We see that probability of Sunny given Yes is 3/9 = 0.33, probability of Sunny is 5/14 = 0.36 and probability of Yes is 9/14 = 0.64. Then we count the likelihoods of Yes and No 𝑃(𝑌 𝑒𝑠|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) = = 𝑃(𝑅𝑎𝑖𝑛𝑦|𝑌 𝑒𝑠) · 𝑃(𝑀 𝑖𝑙𝑑|𝑌 𝑒𝑠) · 𝑃(𝑁 𝑜𝑟𝑚𝑎𝑙|𝑌 𝑒𝑠) · 𝑃(𝑇 𝑟𝑢𝑒|𝑌 𝑒𝑠) · 𝑃(𝑌 𝑒𝑠) = 2 9 · 4 9 · 6 9 · 3 9 · 9 14 = 0.014109347 𝑃(𝑁 𝑜|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) = = 𝑃(𝑅𝑎𝑖𝑛𝑦|𝑁 𝑜) · 𝑃(𝑀 𝑖𝑙𝑑|𝑁 𝑜) · 𝑃(𝑁 𝑜𝑟𝑚𝑎𝑙|𝑁 𝑜) · (𝑇 𝑟𝑢𝑒|𝑁 𝑜) · 𝑃(𝑁 𝑜) = 3 5 · 2 5 · 1 5 · 3 5 · 5 14 = 0.010285714 (1) and suggest Yes. We can normalize the likelihoods to obtain the % confidence: 𝑃(𝑌 𝑒𝑠|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) = 0.014109347 0.014109347 + 0.010285714 = 57.84% 𝑃(𝑁 𝑜|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) = 0.010285714 0.014109347 + 0.010285714 = 42.16% 2 Definition 2 (Support Vector Machines Classifier (two-class, linearly separable)) Support Vector Machines (SVM) finds the hyperplane that bisects and is perpendicular to the connecting line of the closest points from the two classes. The separating (decision) hyperplane is defined in terms of a normal (weight) vector w and a scalar intercept term 𝑏 as 𝑓(𝑥) = w · x + 𝑏 where · is the dot product of vectors. Finally, the SVM classifier becomes 𝑐𝑙𝑎𝑠𝑠(𝑥) = 𝑠𝑔𝑛(𝑓(𝑥)). Exercise 2 Build the SVM classifier for the training set {([1, 1], −1), ([2, 0], −1), ([2, 3], +1)}. We first take the closest two points from the respective classes: [1, 1] and [2, 3]. We have w = 𝑎 · ([1, 1] − [2, 3]) = [𝑎, 2𝑎]. Now we calculate 𝑎 and 𝑏 𝑎 + 2𝑎 + 𝑏 = −1 2𝑎 + 6𝑎 + 𝑏 = 1 for the points [1, 1] and [2, 3], respectively. The solution is 𝑎 = 2 5 𝑏 = −11 5 building the weight vector w = [︂ 2 5 , 4 5 ]︂ and the final classifier becomes 𝑐𝑙𝑎𝑠𝑠(𝑥) = 𝑠𝑔𝑛 (︂ 2 5 𝑥1 + 4 5 𝑥2 − 11 5 )︂ . 3