Seminar 9 Algorithm 1 K-means({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾, stopping criterion) 1: (⃗𝑠1, . . . ,⃗𝑠 𝐾) ← SelectRandomSeeds({⃗𝑥1, . . . , ⃗𝑥 𝑁 }, 𝐾) 2: for 𝑘 ← 1 to 𝐾 do 3: ⃗𝜇 𝑘 ← ⃗𝑠 𝑘 4: end for 5: while stopping criterion has not been met do 6: for 𝑘 ← 1 to 𝐾 do 7: 𝜔 𝑘 ← {} 8: end for 9: for 𝑛 ← 1 to 𝑁 do 10: 𝑗 ← argmin 𝑗′ |⃗𝜇 𝑗′ − ⃗𝑥 𝑛| 11: 𝜔 𝑗 ← 𝜔 𝑗 ∪ {⃗𝑥 𝑛} ◁ reassigning vectors 12: end for 13: for 𝑘 ← 1 to 𝐾 do 14: ⃗𝜇 𝑘 ← 1 |𝜔 𝑘| ∑︀ ⃗𝑥∈𝜔 𝑘 ⃗𝑥 ◁ recomputing centroids 15: end for 16: end while 17: return {⃗𝜇1, . . . , ⃗𝜇 𝐾} Exercise 1 Use the 𝐾-means algorithm with Euclidean distance to cluster the following 𝑁 = 8 examples into 𝐾 = 3 clusters: 𝐴1 = (2, 10), 𝐴2 = (2, 5), 𝐴3 = (8, 4), 𝐴4 = (5, 8), 𝐴5 = (7, 5), 𝐴6 = (6, 4), 𝐴7 = (1, 2), 𝐴8 = (4, 9). Suppose that the initial seeds (centers of each cluster) are 𝐴1, 𝐴4 and 𝐴7. Run the 𝐾-means algorithm for 3 epochs. After each epoch, draw a 10 × 10 space with all the 8 points and show the clusters with the new centroids. 𝑑(𝐴, 𝐵) denotes the Euclidean distance between 𝐴 = (𝑎1, 𝑎2) and 𝐵 = (𝑏1, 𝑏2). It is calculated as 𝑑(𝐴, 𝐵) = √︀ (𝑎1 − 𝑏1)2 + (𝑎2 − 𝑏2)2. Take seeds ⃗𝑠1 = 𝐴1 = (2, 10), ⃗𝑠2 = 𝐴4 = (5, 8), ⃗𝑠3 = 𝐴7 = (1, 2). By 1 we count the alignment for epoch 1: 𝐴1 ∈ 𝜔1, 𝐴2 ∈ 𝜔3, 𝐴3 ∈ 𝜔2, 𝐴4 ∈ 𝜔2, 𝐴5 ∈ 𝜔2, 𝐴6 ∈ 𝜔2, 𝐴7 ∈ 𝜔3, 𝐴8 ∈ 𝜔2; and we get the clusters: 𝜔1 = {𝐴1}, 𝜔2 = {𝐴3, 𝐴4, 𝐴5, 𝐴6, 𝐴8}, 𝜔3 = {𝐴2, 𝐴7}. Centroids of the clusters: ⃗𝜇1 = (2, 10), ⃗𝜇2 = ((8+5+7+6+4)/5, (4+8+5+4+9)/5) = (6, 6), ⃗𝜇3 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5). After epoch 2 the clusters are 𝜔1 = {𝐴1, 𝐴8}, 𝜔2 = {𝐴3, 𝐴4, 𝐴5, 𝐴6}, 𝜔3 = {𝐴2, 𝐴7} with centroids ⃗𝜇1 = (3, 9.5), ⃗𝜇2 = (6.5, 5.25) and ⃗𝜇3 = (1.5, 3.5). And finally after epoch 3, the clusters are 𝜔1 = {𝐴1, 𝐴4, 𝐴8}, 𝜔2 = {𝐴3, 𝐴5, 𝐴6}, 𝜔3 = {𝐴2, 𝐴7} with centroids ⃗𝜇1 = (3.66, 9), ⃗𝜇2 = (7, 4.33) and ⃗𝜇3 = (1.5, 3.5). 1 A7 ∈ cluster3 A8 ∈ cluster2 end of epoch1 new clusters: 1: {A1}, 2: {A3, A4, A5, A6, A8}, 3: {A2, A7} b) centers of the new clusters: C1= (2, 10), C2= ((8+5+7+6+4)/5, (4+8+5+4+9)/5) = (6, 6), C3= ((2+1)/2, (5+2)/2) = (1.5, 3.5) c) 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A 1 A2 A3 A 4 A5 A6 A7 A8 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A1 A 2 A3 A4 A5 A6 A7 A8 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A1 A 2 A 3 A4 A5 A6 A7 A8 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A1 A2 A3 A4 A5 A6 A7 A8 x x x d) We would need two more epochs. After the 2nd epoch the results would be: 1: {A1, A8}, 2: {A3, A4, A5, A6}, 3: {A2, A7} with centers C1=(3, 9.5), C2=(6.5, 5.25) and C3=(1.5, 3.5). After the 3rd epoch, the results would be: 1: {A1, A4, A8}, 2: {A3, A5, A6}, 3: {A2, A7} with centers C1=(3.66, 9), C2=(7, 4.33) and C3=(1.5, 3.5). Exercise 2. Nearest Neighbor clustering Use the Nearest Neighbor clustering algorithm and Euclidean distance to cluster the examples from the previous exercise: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). Suppose that the threshold t is 4. Solution: A1 is placed in a cluster by itself, so we have K1={A1}. We then look at A2 if it should be added to K1 or be placed in a new cluster. 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A1 A2 A3 A4 A5 A6 A7 A8 x x x 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A1 A2 A3 A4 A5 A6 A7 A8x x x Figure 1: Visualization of 𝐾-means clustering algorithm. 2