Flat clustering (Chapter 16)
Algorithm 1 K-means({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾, stopping criterion)
1: (⃗𝑠1, . . . ,⃗𝑠 𝐾) ← SelectRandomSeeds({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾)
2: for 𝑘 ← 1 to 𝐾 do
3: ⃗𝜇 𝑘 ← ⃗𝑠 𝑘
4: end for
5: repeat
6: for 𝑘 ← 1 to 𝐾 do
7: 𝜔 𝑘 ← {}
8: end for
9: for 𝑛 ← 1 to 𝑁 do
10: 𝑗 ← argmin 𝑗′ |⃗𝜇 𝑗′ − ⃗𝑥 𝑛|
11: 𝜔 𝑗 ← 𝜔 𝑗 ∪ {⃗𝑥 𝑛} ◁ reassigning vectors
12: end for
13: for 𝑘 ← 1 to 𝐾 do
14: ⃗𝜇 𝑘 ← 1
|𝜔 𝑘|
∑︀
⃗𝑥∈𝜔 𝑘
⃗𝑥 ◁ recomputing centroids
15: end for
16: until a stopping criterion has been met
17: return {⃗𝜇1, . . . , ⃗𝜇 𝐾}
Exercise 16/1
Use the 𝐾-means algorithm with Euclidean distance to cluster the following 𝑁 = 8
examples into 𝐾 = 3 clusters: 𝐴1 = (2, 10), 𝐴2 = (2, 5), 𝐴3 = (8, 4), 𝐴4 = (5, 8),
𝐴5 = (7, 5), 𝐴6 = (6, 4), 𝐴7 = (1, 2), 𝐴8 = (4, 9). Suppose that the initial seeds (centers
of each cluster) are 𝐴1, 𝐴4 and 𝐴7. Run the 𝐾-means algorithm for 3 epochs. After
each epoch, draw a 10 × 10 space with all the 8 points and show the clusters with the
new centroids.
Exercise 16/2
What makes a good clustering? Give some clustering evaluation metrics.
1