Seminar 7
Definition 1 (Naive Bayes Classifier) Naive Bayes (NB) Classifier assumes that the
effect of the value of a predictor 𝑥 on a given class 𝑐 is class conditional independent.
Bayes theorem provides a way of calculating the posterior probability 𝑃(𝑐|𝑥) from class prior
probability 𝑃(𝑐), predictor prior probability 𝑃(𝑥) and probability of the predictor given the
class 𝑃(𝑥|𝑐)
𝑃(𝑐|𝑥) =
𝑃(𝑥|𝑐)𝑃(𝑐)
𝑃(𝑥)
and for a vector of predictors 𝑋 = (𝑥1, . . . , 𝑥 𝑛)
𝑃(𝑐|𝑋) = 𝑃(𝑥1|𝑐) . . . 𝑃(𝑥 𝑛|𝑐)𝑃(𝑐).
The class with the highest posterior probability is the outcome of prediction.
Exercise 1
Considering the table of observations, use the Naive Bayes classifier to recommend whether
to Play Golf given a day with Outlook = Rainy, Temperature = Mild, Humidity = Normal
and Windy = True. Do not deal with the zero-frequency problem.
Outlook Temperature Humidity Windy Play Golf
Rainy Hot High False No
Rainy Hot High True No
Overcast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Overcast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Sunny Mild High True No
Table 1: Exercise.
1
First build the likelihood tables for each predictor
Play Golf
Yes No
Outlook
Sunny 3/9 2/5 5/14
Overcast 4/9 0/5 4/14
Rainy 2/9 3/5 5/14
9/14 5/14
Play Golf
Yes No
Temperature
Hot 2/9 2/5 4/14
Mild 4/9 2/5 6/14
Cool 3/9 1/5 4/14
9/14 5/14
Play Golf
Yes No
Humidity
High 3/9 4/5 7/14
Normal 6/9 1/5 7/14
9/14 5/14
Play Golf
Yes No
Windy
True 3/9 2/5 5/14
False 6/9 3/5 9/14
9/14 5/14
We see that probability of Sunny given Yes is 3/9 = 0.33, probability of Sunny is 5/14 = 0.36
and probability of Yes is 9/14 = 0.64. Then we count the likelihoods of Yes and No
𝑃(𝑌 𝑒𝑠|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) =
= 𝑃(𝑅𝑎𝑖𝑛𝑦|𝑌 𝑒𝑠) · 𝑃(𝑀 𝑖𝑙𝑑|𝑌 𝑒𝑠) · 𝑃(𝑁 𝑜𝑟𝑚𝑎𝑙|𝑌 𝑒𝑠) · 𝑃(𝑇 𝑟𝑢𝑒|𝑌 𝑒𝑠) · 𝑃(𝑌 𝑒𝑠)
=
2
9
·
4
9
·
6
9
·
3
9
·
9
14
= 0.014109347
𝑃(𝑁 𝑜|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) =
= 𝑃(𝑅𝑎𝑖𝑛𝑦|𝑁 𝑜) · 𝑃(𝑀 𝑖𝑙𝑑|𝑁 𝑜) · 𝑃(𝑁 𝑜𝑟𝑚𝑎𝑙|𝑁 𝑜) · (𝑇 𝑟𝑢𝑒|𝑁 𝑜) · 𝑃(𝑁 𝑜)
=
3
5
·
2
5
·
1
5
·
3
5
·
5
14
= 0.010285714
(1)
and suggest Yes. We can normalize the likelihoods to obtain the % confidence:
𝑃(𝑌 𝑒𝑠|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) =
0.014109347
0.014109347 + 0.010285714
= 57.84%
𝑃(𝑁 𝑜|𝑅𝑎𝑖𝑛𝑦, 𝑀 𝑖𝑙𝑑, 𝑁 𝑜𝑟𝑚𝑎𝑙, 𝑇 𝑟𝑢𝑒) =
0.010285714
0.014109347 + 0.010285714
= 42.16%
2
Definition 2 (Support Vector Machines Classifier (two-class, linearly separable))
Support Vector Machines (SVM) finds the hyperplane that bisects and is perpendicular to
the connecting line of the closest points from the two classes. The separating (decision)
hyperplane is defined in terms of a normal (weight) vector w and a scalar intercept term 𝑏 as
𝑓(𝑥) = w · x + 𝑏
where · is the dot product of vectors. Finally, the SVM classifier becomes
𝑐𝑙𝑎𝑠𝑠(𝑥) = 𝑠𝑔𝑛(𝑓(𝑥)).
Exercise 2
Build the SVM classifier for the training set {([1, 1], −1), ([2, 0], −1), ([2, 3], +1)}.
We first take the closest two points from the respective classes: [1, 1] and [2, 3]. We have
w = 𝑎 · ([1, 1] − [2, 3]) = [𝑎, 2𝑎]. Now we calculate 𝑎 and 𝑏
𝑎 + 2𝑎 + 𝑏 = −1
2𝑎 + 6𝑎 + 𝑏 = 1
for the points [1, 1] and [2, 3], respectively. The solution is
𝑎 =
2
5
𝑏 =
−11
5
building the weight vector
w =
[︂
2
5
,
4
5
]︂
and the final classifier becomes
𝑐𝑙𝑎𝑠𝑠(𝑥) = 𝑠𝑔𝑛
(︂
2
5
𝑥1 +
4
5
𝑥2 −
11
5
)︂
.
3