ID3 Algorithm - Complete Illustration Consider the dataset D specified by the following table: index X Y Z class 1 1 0 1 Yes 2 1 1 0 Yes 3 1 0 1 Yes 4 1 0 1 Yes 5 0 1 1 No 6 1 0 0 No 7 0 1 0 No 8 0 1 0 No 9 1 1 1 No 10 0 0 1 Yes There are three attributes A = {X, Y, Z} and two classes C = {Yes, No}. Each attribute has possible values 0 and 1. Let us use indices 1 - 10 to denote elements of the dataset D. That is, write D = {1, . . . , 10}. Let me demonstrate the execution of the algorithm ID3 with impurity decrease (Gini) to select the best-classifying attributes in every call of ID3. The algorithm proceeds as follows: ID3(D, A) • At line 2 create the node τ1 (see Image 1). 1 • No “if” condition is satisfied, so we continue on line 10. On line 10, identify the best classifying attribute: – To compute the impurity decrease, we need to compute Gini(D) for D = {1, . . . , 10} as follows: ∗ pYes = pNo = 1/2 ∗ Gini(D) = 1 − p2 Yes − p2 No = 1 − (1/2)2 − (1/2)2 = 0.5 – Consider X ∈ A and compute ImpDec(D, X) as follows: ∗ Consider value 1 of X. Then D1 = {1, 2, 3, 4, 6, 9}. index X Y Z class 1 1 0 1 Yes 2 1 1 0 Yes 3 1 0 1 Yes 4 1 0 1 Yes 6 1 0 0 No 9 1 1 1 No 1Note that the nodes of the tree are numbered sequentially so that each node gets a unique index. The indices do not correspond to the indices assigned by the pseudocode. 1 · Thus pYes = 4/6 and pNo = 2/6 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − (4/6)2 − (2/6)2 = 0.444 ∗ Consider value 0 of X. Then D0 = {5, 7, 8, 10}. index X Y Z class 5 0 1 1 No 7 0 1 0 No 8 0 1 0 No 10 0 0 1 Yes · Thus pYes = 1/4 and pNo = 3/4 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − (1/4)2 − (3/4)2 = 0.375 ImpDec(D, X) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.5 − (6/10) · 0.444 − (4/10) · 0.375 = 0.083 – Consider Y ∈ A and compute ImpDec(D, Y ) as follows: ∗ Consider value 1 of Y . Then D1 = {2, 5, 7, 8, 9}. index X Y Z class 2 1 1 0 Yes 5 0 1 1 No 7 0 1 0 No 8 0 1 0 No 9 1 1 1 No · Thus pYes = 1/5 and pNo = 4/5 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − (1/5)2 − (4/5)2 = 0.320 ∗ Consider value 0 of Y . Then D0 = {1, 3, 4, 6, 10}. index X Y Z class 1 1 0 1 Yes 3 1 0 1 Yes 4 1 0 1 Yes 6 1 0 0 No 10 0 0 1 Yes · Thus pYes = 4/5 and pNo = 1/5 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − (4/5)2 − (1/5)2 = 0.320 ImpDec(D, Y ) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.500 − (5/10) · 0.320 − (5/10) · 0.320 = 0.180 – Consider Z ∈ A and compute ImpDec(D, Z) as follows: ∗ Consider value 1 of Z. Then D1 = {1, 3, 4, 5, 9, 10}. 2 index X Y Z class 1 1 0 1 Yes 3 1 0 1 Yes 4 1 0 1 Yes 5 0 1 1 No 9 1 1 1 No 10 0 0 1 Yes · Thus pYes = 4/6 and pNo = 2/6 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − (4/6)2 − (2/6)2 = 0.444 ∗ Consider value 0 of Y . Then D0 = {2, 6, 7, 8}. index X Y Z class 2 1 1 0 Yes 6 1 0 0 No 7 0 1 0 No 8 0 1 0 No · Thus pYes = 1/4 and pNo = 3/4 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − (1/4)2 − (3/4)2 = 0.375 ImpDec(D, Z) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.500 − (6/10) · 0.444 − (4/10) · 0.375 = 0.083 So, the attribute Y maximizes the decrease in impurity. • Set the decision attribute of τ1 to Y and continue recursively by calling – ID3({2, 5, 7, 8, 9}, {X, Z}) giving a node τ2 – ID3({1, 3, 4, 6, 10}, {X, Z}) giving a node τ3. • Connect τ1 with τ2 by an edge assigned 1, and τ1 with τ3 using an edge assigned 0. Now, let us demonstrate the recursive calls. ID3({2, 5, 7, 8, 9}, {X, Z}) Now D = {2, 5, 7, 8, 9} and A = {X, Z}. index X Z class 2 1 0 Yes 5 0 1 No 7 0 0 No 8 0 0 No 9 1 1 No • At line 2, create the node τ2. 3 • No “if” condition is satisfied, so we continue on line 10. On line 10, identify the best classifying attribute: – We have already computed Gini(D) = 0.320 – Consider X ∈ A and compute ImpDec(D, X) as follows: ∗ Consider value 1 of X. Then D1 = {2, 9}. index X Z class 2 1 0 Yes 9 1 1 No · Thus pYes = 1/2 and pNo = 1/2 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − (1/2)2 − (1/2)2 = 0.500 ∗ Consider value 0 of X. Then D0 = {5, 7, 8}. index X Z class 5 0 1 No 7 0 0 No 8 0 0 No · Thus pYes = 0 and pNo = 1 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − 02 − 12 = 0 ImpDec(D, X) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.320 − (2/5) · 0.500 − (3/5) · 0.000 = 0.120 – Consider Z ∈ A and compute ImpDec(D, Z) as follows: ∗ Consider value 1 of Z. Then D1 = {5, 9}. index X Z class 5 0 1 No 9 1 1 No · Thus pYes = 0 and pNo = 1 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − 02 − 12 = 0 ∗ Consider value 0 of Z. Then D0 = {2, 7, 8}. index X Z class 2 1 0 Yes 7 0 0 No 8 0 0 No · Thus pYes = 1/3 and pNo = 2/3 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − (1/3)2 − (2/3)2 = 0.444 ImpDec(D, X) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.320 − (2/5) · 0.000 − (3/5) · 0.444 = 0.053 So, the attribute X maximizes the decrease in impurity. 4 – Set the decision attribute of τ2 to X and continue recursively by calling ∗ ID3({2, 9}, {Z}) giving a node τ4 ∗ ID3({5, 7, 8}, {Z}) giving a node τ5. – Connect τ2 with τ4 by an edge assigned 1, and τ1 with τ5 using an edge assigned 0. ID3({2, 9}, {Z}) Now D = {2, 9} and A = {Z}. index Z class 2 0 Yes 9 1 No • At line 2, create the node τ4 • No “if” condition is satisfied, so we continue on line 10. On line 10, identify the best classifying attribute: There is only Z, which is automatically selected. • Set the decision attribute of τ2 to Z and continue recursively by calling – ID3({9}, {}) giving a node τ6 – ID3({2}, {}) giving a node τ7. • Connect τ4 with τ6 by an edge assigned 1, and τ4 with τ7 using an edge assigned 0. ID3({9}, {}) Now D = {9} and A = {}. • At line 2, create the node τ6 • The “if” at line 5 is satisfied since all elements of D are of class No. Assign No to τ6 and return τ6. ID3({2}, {}) Now D = {2} and A = {}. • At line 2, create the node τ7 • The “if” at line 5 is satisfied since all elements of D are of class Yes. Assign Yes to τ7 and return τ7. 5 ID3({5, 7, 8}, {Z}) Now D = {5, 7, 8} and A = {X, Z}. • At line 2, create the node τ5 • The “if” at line 5 is satisfied since all elements of D are of class No. Assign No to τ5 and return τ5. ID3({1, 3, 4, 6, 10}, {X, Z}) Now D = {1, 3, 4, 6, 10} and A = {X, Z}. index X Z class 1 1 1 Yes 3 1 1 Yes 4 1 1 Yes 6 1 0 No 10 0 1 Yes • At line 2, create the node τ3. • No “if” condition is satisfied, so we continue on line 10. On line 10, identify the best classifying attribute: – We have already computed Gini(D) = 0.320 – Consider X ∈ A and compute ImpDec(D, X) as follows: ∗ Consider value 1 of X. Then D1 = {1, 3, 4, 6}. index X Z class 1 1 1 Yes 3 1 1 Yes 4 1 1 Yes 6 1 0 No · Thus pYes = 3/4 and pNo = 1/4 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − (3/4)2 − (1/4)2 = 0.375 ∗ Consider value 0 of X. Then D0 = {10}. index X Z class 10 0 1 Yes · Thus pYes = 1 and pNo = 0 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − 12 − 02 = 0 ImpDec(D, X) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.320 − (4/5) · 0.375 − (1/5) · 0.000 = 0.020 6 – Consider Z ∈ A and compute ImpDec(D, Z) as follows: ∗ Consider value 1 of Z. Then D1 = {1, 3, 4, 10}. index X Z class 1 1 1 Yes 3 1 1 Yes 4 1 1 Yes 10 0 1 Yes · Thus pYes = 1 and pNo = 0 · Gini(D1) = 1 − p2 Yes − p2 No = 1 − 12 − 02 = 0 ∗ Consider value 0 of Z. Then D0 = {6}. index X Z class 6 1 0 No · Thus pYes = 0 and pNo = 1 · Gini(D0) = 1 − p2 Yes − p2 No = 1 − 02 − 12 = 0 ImpDec(D, X) = Gini(D) − (|D1|/|D|)Gini(D1) − (|D0|/|D|)Gini(D0) = 0.320 − (4/5) · 0.000 − (1/5) · 0.000 = 0.320 So, the attribute Z maximizes the decrease in impurity. – Set the decision attribute of τ3 to Z and continue recursively by calling ∗ ID3({1, 3, 4, 10}, {X}) giving a node τ8 ∗ ID3({6}, {X}) giving a node τ9. – Connect τ3 with τ8 by an edge assigned 1, and τ3 with τ9 using an edge assigned 0. ID3({1, 3, 4, 10}, {X}) Now D = {1, 3, 4, 10} and A = {X}. • At line 2, create the node τ8 • The “if” at line 5 is satisfied since all elements of D are of class Yes. Assign Yes to τ8 and return τ8. ID3({6}, {X}) Now D = {6} and A = {X}. • At line 2, create the node τ9 • The “if” at line 5 is satisfied since all elements of D are of class No. Assign No to τ9 and return τ9. 7 a ~ ~ ~~ ~ ?-______, C> "-~ ....__,_--..... ~-<:::::) 'S;--~ ~ -----.,..__,I >- ~ \"v-, c,O 1't-- ,---.I') '-,,----., \vLtJ o ~ ~ ~ c:--J \..A.,-- ~ c4 \0 ....J'--"' r<:i// \0 - ~ c?J ---\r y:_ ~ \__n ('-.Jv- rl ___________.... ~_...,,,._, ~ c----! ('N ~ ~ \-,....J'-..0 <:) ~ ::2. ------.cr-- .__,,,._I Figure 1: The tree 8