Seminar 9
Algorithm 1 K-means({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾, stopping criterion)
1: (⃗𝑠1, . . . ,⃗𝑠 𝐾) ← SelectRandomSeeds({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾)
2: for 𝑘 ← 1 to 𝐾 do
3: ⃗𝜇 𝑘 ← ⃗𝑠 𝑘
4: end for
5: while stopping criterion has not been met do
6: for 𝑘 ← 1 to 𝐾 do
7: 𝜔 𝑘 ← {}
8: end for
9: for 𝑛 ← 1 to 𝑁 do
10: 𝑗 ← argmin 𝑗′ |⃗𝜇 𝑗′ − ⃗𝑥 𝑛|
11: 𝜔 𝑗 ← 𝜔 𝑗 ∪ {⃗𝑥 𝑛} ◁ reassigning vectors
12: end for
13: for 𝑘 ← 1 to 𝐾 do
14: ⃗𝜇 𝑘 ← 1
|𝜔 𝑘|
∑︀
⃗𝑥∈𝜔 𝑘
⃗𝑥 ◁ recomputing centroids
15: end for
16: end while
17: return {⃗𝜇1, . . . , ⃗𝜇 𝐾}
Exercise 1
Use the 𝐾-means algorithm with Euclidean distance to cluster the following 𝑁 = 8 examples
into 𝐾 = 3 clusters: 𝐴1 = (2, 10), 𝐴2 = (2, 5), 𝐴3 = (8, 4), 𝐴4 = (5, 8), 𝐴5 = (7, 5),
𝐴6 = (6, 4), 𝐴7 = (1, 2), 𝐴8 = (4, 9). Suppose that the initial seeds (centers of each cluster)
are 𝐴1, 𝐴4 and 𝐴7. Run the 𝐾-means algorithm for 3 epochs. After each epoch, draw a
10 × 10 space with all the 8 points and show the clusters with the new centroids.
𝑑(𝐴, 𝐵) denotes the Euclidean distance between 𝐴 = (𝑎1, 𝑎2) and 𝐵 = (𝑏1, 𝑏2). It is calculated
as 𝑑(𝐴, 𝐵) =
√︀
(𝑎1 − 𝑏1)2 + (𝑎2 − 𝑏2)2.
Take seeds ⃗𝑠1 = 𝐴1 = (2, 10), ⃗𝑠2 = 𝐴4 = (5, 8), ⃗𝑠3 = 𝐴7 = (1, 2).
By 1 we count the alignment for epoch 1: 𝐴1 ∈ 𝜔1, 𝐴2 ∈ 𝜔3, 𝐴3 ∈ 𝜔2, 𝐴4 ∈ 𝜔2, 𝐴5 ∈ 𝜔2,
𝐴6 ∈ 𝜔2, 𝐴7 ∈ 𝜔3, 𝐴8 ∈ 𝜔2; and we get the clusters: 𝜔1 = {𝐴1}, 𝜔2 = {𝐴3, 𝐴4, 𝐴5, 𝐴6, 𝐴8},
𝜔3 = {𝐴2, 𝐴7}.
Centroids of the clusters: ⃗𝜇1 = (2, 10), ⃗𝜇2 = ((8+5+7+6+4)/5, (4+8+5+4+9)/5) = (6, 6),
⃗𝜇3 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5).
After epoch 2 the clusters are 𝜔1 = {𝐴1, 𝐴8}, 𝜔2 = {𝐴3, 𝐴4, 𝐴5, 𝐴6}, 𝜔3 = {𝐴2, 𝐴7} with
centroids ⃗𝜇1 = (3, 9.5), ⃗𝜇2 = (6.5, 5.25) and ⃗𝜇3 = (1.5, 3.5). And finally after epoch
3, the clusters are 𝜔1 = {𝐴1, 𝐴4, 𝐴8}, 𝜔2 = {𝐴3, 𝐴5, 𝐴6}, 𝜔3 = {𝐴2, 𝐴7} with centroids
⃗𝜇1 = (3.66, 9), ⃗𝜇2 = (7, 4.33) and ⃗𝜇3 = (1.5, 3.5).
1
A7 ∈ cluster3 A8 ∈ cluster2
end of epoch1
new clusters: 1: {A1}, 2: {A3, A4, A5, A6, A8}, 3: {A2, A7}
b) centers of the new clusters:
C1= (2, 10), C2= ((8+5+7+6+4)/5, (4+8+5+4+9)/5) = (6, 6), C3= ((2+1)/2, (5+2)/2) = (1.5, 3.5)
c)
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A
1
A2
A3
A
4
A5
A6
A7
A8
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A1
A
2 A3
A4
A5
A6
A7
A8
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A1
A
2 A
3
A4
A5
A6
A7
A8
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A1
A2
A3
A4
A5
A6
A7
A8
x
x
x
d)
We would need two more epochs. After the 2nd
epoch the results would be:
1: {A1, A8}, 2: {A3, A4, A5, A6}, 3: {A2, A7}
with centers C1=(3, 9.5), C2=(6.5, 5.25) and C3=(1.5, 3.5).
After the 3rd
epoch, the results would be:
1: {A1, A4, A8}, 2: {A3, A5, A6}, 3: {A2, A7}
with centers C1=(3.66, 9), C2=(7, 4.33) and C3=(1.5, 3.5).
Exercise 2. Nearest Neighbor clustering
Use the Nearest Neighbor clustering algorithm and Euclidean distance to cluster the examples from the
previous exercise: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
Suppose that the threshold t is 4.
Solution:
A1 is placed in a cluster by itself, so we have K1={A1}.
We then look at A2 if it should be added to K1 or be placed in a new cluster.
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A1
A2
A3
A4
A5
A6
A7
A8
x
x
x
0
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
A1
A2
A3
A4
A5
A6
A7
A8x
x
x
Figure 1: Visualization of 𝐾-means clustering algorithm.
2