{"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.1"},"colab":{"name":"PB016_lab11_G01.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"A8Cj_X1VnmC0"},"source":["# PB016: Artificial intelligence I, labs 11 - Machine learning\n","\n","Today we'll dig into machine learning, or rather two specific examples of \"classic\" machine learning models. We'll focus namely on:\n","1. __Representation and preparation of input data__\n","2. __Simple algorithm for training a perceptron__\n","3. __Simple algorithm for training a decision tree__\n","4. __The exercises in a few lines of code__"]},{"cell_type":"markdown","metadata":{"id":"MjLOiihWnmC1"},"source":["---\n","\n","## 1. Representation and preparation of input data\n","\n","__Basic facts__\n","- Machine learning typically works with a training set of machine-readable input data in the form of examples, which may (or may not) be labeled by an identifier (or identifiers) of the class (or classes) to which they belong.\n","- Each example is usually assigned a unique identifier and a set of specific property (or also feature) values that depend on the specific domain of the problem.\n","- A suitable representation of such data is a matrix mapping the example identifiers to vectors of their specific feature values, and (for supervised learning) also a vector mapping the example identifiers to the identifiers of the classes to which they belong."]},{"cell_type":"markdown","metadata":{"id":"Tzl_o5pXnmC2"},"source":["### Sample problem - classification of [black metal](https://en.wikipedia.org/wiki/Black_metal) fans\n","\n","![black_metal](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/blackmetal.png)\n","\n","- The problem may not be as simple as it seems (for example, not all black metal fans are Satanists with striking body paint).\n","- Example of a simple dataset:\n","\n","| __ID__ || __HL__ | __FHT__ | __LN__ | __LS__ || __FBM__ |\n","|--------||--------|---------|--------|--------||---------|\n","| Petr || l | br | 9/10 | 1/10 || y |\n","| Marie || s | $-$ | 8/10 | 9/10 || y |\n","| Azazel || l | ms | 1/10 | 10/10 || n |\n","| Pavel || b | br | 7/10 | 2/10 || y |\n","\n","- Meaning of columns and individual feature values (with the \"$ - $\" symbol for \"missing\"):\n"," - __ID__ - example identifier (names of people we classify into \"black metal fan\" and \"others\" classes)\n"," - __HL__ - hair length (one of $ \\{b, s, m, l \\} $ for bald, short, medium and long hair)\n"," - __FHT__ - beard type (one of $ \\{no, gt, ms, sb, br \\} $ for no beard, goatie, moustache, sideburns and beard)\n"," - __LN__ - relationship to nature (on a scale from 0 to 10, the higher the more devout)\n"," - __LS__ - relationship to Satan (on a scale from 0 to 10, the higher the more devout)\n"," - __FBM__ - belonging to the class of black metal fans (one of $ \\{y, n \\} $ for yes and no)"]},{"cell_type":"markdown","metadata":{"id":"Ein7Ntv3nmC3"},"source":["### Simple structures for sample data representation"]},{"cell_type":"code","metadata":{"id":"n5K3FEn8nmC3"},"source":["# vectors (in the form of dictionaries) mapping the symbolic identifiers of the \n","# examples and features to unique numbers\n","\n","example_id_map = {\n"," 'Petr' : 0,\n"," 'Marie' : 1,\n"," 'Azazel' : 2,\n"," 'Pavel' : 3,\n","}\n","\n","feature_id_map = {\n"," 'HL' : 0,\n"," 'FHT' : 1,\n"," 'LN' : 2,\n"," 'LS' : 3,\n","}\n","\n","# inverse vectors to the previous two (may come handy for user-friendly \n","# statements)\n","\n","example_id_map_inv = dict([(y,x) for x,y in example_id_map.items()])\n","feature_id_map_inv = dict([(y,x) for x,y in feature_id_map.items()])\n","\n","# vector mapping example identifiers to identifiers of their classes \n","# (0 corresponds to 'n', 1 corresponds to 'y')\n","\n","labels = {0:1, 1:1, 2:0, 3:1}\n","\n","# matrix (in the form of a \"two-dimensional\" dictionary) mapping example \n","# identifiers to the values of their properties (values are currently \n","# initialized to 0)\n","\n","features = {}\n","for i in range(4):\n"," for j in range(4):\n"," features[(i,j)] = 0"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"o7P4XLo3nmC8"},"source":["### __Numerical representation of property values__\n","\n","- Many machine learning algorithms work better with \"nice and clean\" numerical data.\n","- Let's assign such values to the individual fields of the `features` matrix based on the information in the problem definition above.\n","- Word of warning - it's not as trivial as it might seem:\n"," - Numeric values should ideally preserve the meaning of the original ones.\n"," - Missing data items should typically be meaningfully filled in (\"imputed\").\n"," - For many machine learning algorithms, the data should also be normalised so that the ranges of the individual columns are more or less balanced."]},{"cell_type":"markdown","metadata":{"id":"qivN-ufYnmC9"},"source":["#### __Possible numerical representation of property values__"]},{"cell_type":"code","metadata":{"scrolled":false,"id":"nQa17X6InmC-"},"source":["# feature value ranges and their corresponding numerical representations:\n","# HL: {b,s,m,l} ~ {0,1,2,3}\n","# FHT: {no,gt,ms,sb,br} ~ {0,1,2,3,4}\n","# LN: 1..10 ~ 1..10\n","# LS: 1..10 ~ 1..10\n","\n","# numerical representation of property values - training data set\n","# (note: imputation uses domain knowledge - women usually do not have much \n","# facial hair)\n","\n","features = {\n"," (0,0) : 3, (0,1) : 4, (0,2) : 9, (0,3) : 1, # ~ Petr: l, br, 9/10, 1/10\n"," (1,0) : 1, (1,1) : 0, (1,2) : 8, (1,3) : 9, # ~ Marie: s, -, 9/10, 9/10\n"," (2,0) : 3, (2,1) : 2, (2,2) : 1, (2,3) : 10, # ~ Azazel: l, ms, 1/10, 10/10\n"," (3,0) : 0, (3,1) : 4, (3,2) : 7, (3,3) : 2, # ~ Pavel: b, br, 7/10, 2/10\n","}\n","\n","# normalization of property values (by maximum of each column)\n","\n","# initial maximum value\n","max_dict = {0:0, 1:0, 2:0, 3:0}\n","\n","# determining the actual value of the maximum by scanning the matrix of \n","# property values\n","# - NOTE: this implementation finds the maximum from the data, which may not\n","# be completely correct for the relationship to nature, where the \"real\" \n","# maximum (10) is not in the data (the maximum there is 9)\n","for i,j in features:\n"," if features[(i,j)] > max_dict[j]:\n"," max_dict[j] = features[(i,j)]\n","\n","# normalization of values to 0-1 interval by dividing the values by the maximum\n","for i,j in features:\n"," features[(i,j)] /= max_dict[j]\n"," \n","# listing the final matrix\n","rows = []\n","for i in range(4):\n"," columns = []\n"," for j in range(4):\n"," columns.append(round(features[(i,j)],1))\n"," rows.append(' '.join([str(x) for x in columns]))\n","print('Normalized matrix of numeric property values: \\n')\n","print('\\n'.join(rows))"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"tihcJlMWnmDC"},"source":["---\n","\n","## 2. A simple algorithm for training a perceptron\n","\n","__Basic facts__\n","- [Perceptron](https://en.wikipedia.org/wiki/Perceptron) is one of the oldest concepts in the theory (and practice) of machine learning (proposed in 1958).\n","- It is a [linear](https://en.wikipedia.org/wiki/Linear_classifier) ​​[binary](https://en.wikipedia.org/wiki/Binary_classification) classifier for supervised learning over data represented by number vectors.\n","- It is based on the so-called threshold function $ f $ mapping the input vector $ \\mathbf{x} $ to a binary scalar $ f(\\mathbf{x}) $ as follows:\n"," - $ f(\\mathbf{x}) = 1 $ if $ \\mathbf{w} \\cdot \\mathbf{x} + b > 0 $\n"," - otherwise $ f(\\mathbf{x}) = 0 $\n","- This function has two parameters - a scalar threshold $ b $ (can be equal to zero) and a weight vector $ \\mathbf{w} $, which is optimized to best discriminate the data based on their membership in individual classes (positive vs. negative examples).\n","- The perceptron has become the basis of modern [deep learning](https://en.wikipedia.org/wiki/Deep_learning), which (very simply) uses perceptron-like units in many complex interconnected \"hidden\" layers. Parameters (i.e., weight vectors) are usually optimized by [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) in such networks.\n","\n","__General perceptron for illustration__\n","\n","![perceptron](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/perceptron.png)"]},{"cell_type":"markdown","metadata":{"id":"yLeTeD2tnmDD"},"source":["### Rules for incremental updating of scales when learning the perceptron\n","- For simplicity, assume that the parameter $ b = 0 $.\n","- At the beginning we initialize the vector $ \\mathbf{w} $ to random values.\n","- In each individual learning step (\"epoch\") we randomly select one example $ \\mathbf{x} $ from the training dataset.\n"," - If the example $ \\mathbf{x} $ is positive and $ \\mathbf{w} \\cdot \\mathbf{x} < 0 $, we assign $ \\mathbf{w} $ the value $ \\mathbf{w} + \\mathbf{x} $.\n"," - If the example $ \\mathbf{x} $ is negative and $ \\mathbf{w} \\cdot \\mathbf{x}> 0 $, we assign $ \\mathbf{w} $ the value $ \\mathbf{w} - \\mathbf{x} $.\n","- We repeat this until an approximate convergence is achieved (e.g., until all examples are classified correctly or until a predefined number of epochs is reached)."]},{"cell_type":"markdown","metadata":{"id":"kh28CaMxnmDD"},"source":["### __Exercise 2.1: Training a simple perceptron__\n","\n","- Implement the above algorithm, training on and classifying the black metal data set we worked with in the previous section.\n","- Experiment with convergence as you wish.\n","- A note on the solution - use the predefined auxiliary functions for operations with vectors represented by dictionaries in Python (in the spirit of the previous section)."]},{"cell_type":"code","metadata":{"scrolled":false,"id":"s-3lUnfXnmDE"},"source":["import random\n","\n","# auxiliary vector operation functions\n","\n","def dot_product(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," dp = 0\n"," for i in range(len(u)):\n"," dp += u[i]*v[i]\n"," return dp\n","\n","def plus(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," w = u\n"," for i in range(len(w)):\n"," w[i] += v[i]\n"," return w\n","\n","def minus(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," w = u\n"," for i in range(len(w)):\n"," w[i] -= v[i]\n"," return w\n","\n","# learning the weight vector (main part of the algorithm; \"convergence\" \n","# according to a fixed number of epochs)\n","\n","def optimise_weights(n_epochs=20):\n"," # random weight init - TODO - COMPLETE YOURSELF\n","\n"," w = dict([(i,0) for i in range(4)])\n"," \n"," # iterative updates of the weights\n","\n"," for epoch in range(n_epochs):\n"," \n"," # TODO - COMPLETE YOURSELF\n","\n"," pass # TODO - REMOVE IN THE ACTUAL IMPLEMENTATION\n"," \n"," return w\n","\n","# function for verifying the classification of training examples based on the \n","# weight vector\n"," \n","def classify(x_id,w):\n"," # creating the example vector\n"," x = {}\n"," for j in range(4):\n"," x[j] = features[(x_id,j)]\n"," \n"," # multiplying the vector by the weights\n"," if dot_product(x,w) > 0:\n"," return 1\n"," return 0\n"," \n","# testing the implementation\n","w = optimise_weights()\n","print('Learned weight vector:', ' '.join([str(w[i]) for i in range(4)]))\n","print('Classifying the training examples...')\n","for i in range(4):\n"," class_i = classify(i,w)\n"," print('Example:', example_id_map_inv[i])\n"," print(' actual class :', labels[i])\n"," print(' predicted class:', class_i)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"-VEX4SKyp20v"},"source":["#### __Possible way of training a simple perceptron__"]},{"cell_type":"code","metadata":{"scrolled":false,"id":"Putcjn5hqH9l"},"source":["import random\n","\n","# auxiliary vector operation functions\n","\n","def dot_product(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," dp = 0\n"," for i in range(len(u)):\n"," dp += u[i]*v[i]\n"," return dp\n","\n","def plus(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," w = u\n"," for i in range(len(w)):\n"," w[i] += v[i]\n"," return w\n","\n","def minus(u,v):\n"," if len(u) != len(v):\n"," raise ValueError\n"," w = u\n"," for i in range(len(w)):\n"," w[i] -= v[i]\n"," return w\n","\n","# learning the weight vector (main part of the algorithm; \"convergence\" \n","# according to a fixed number of epochs)\n","\n","def optimise_weights(n_epochs=20):\n"," # random weight init - TODO - COMPLETE YOURSELF\n"," w = dict([(i,random.random()) for i in range(4)])\n"," \n"," for epoch in range(n_epochs):\n"," \n"," # random example ID choice\n"," x_id = random.choice([0,1,2,3])\n"," \n"," # creating the example vector from the corresponding feature matrix row\n"," x = {}\n"," for j in range(4):\n"," x[j] = features[(x_id,j)]\n"," \n"," # updating the weights\n"," if labels[x_id] == 1 and dot_product(x,w) < 0:\n"," w = plus(w,x)\n"," if labels[x_id] == 0 and dot_product(x,w) > 0:\n"," w = minus(w,x)\n"," return w\n","\n","# function for verifying the classification of training examples based on the \n","# weight vector\n"," \n","def classify(x_id,w):\n"," # creating the example vector\n"," x = {}\n"," for j in range(4):\n"," x[j] = features[(x_id,j)]\n"," \n"," # multiplying the vector by the weights\n"," if dot_product(x,w) > 0:\n"," return 1\n"," return 0\n"," \n","# testing the implementation\n","w = optimise_weights()\n","print('Learned weight vector:', ' '.join([str(w[i]) for i in range(4)]))\n","print('Classifying the training examples...')\n","for i in range(4):\n"," class_i = classify(i,w)\n"," print('Example:', example_id_map_inv[i])\n"," print(' actual class :', labels[i])\n"," print(' predicted class:', class_i)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"VJVY_OIsnmDH"},"source":["### __Food for thought:__\n","\n","- Is the chosen data representation suitable for perceptron learning?\n","- What could possibly be done differently?"]},{"cell_type":"markdown","metadata":{"id":"svZsiCRTnmDI"},"source":["---\n","\n","## 3. A simple algorithm for training a decision tree\n","\n","__Basic facts__\n","- [Decision tree learning](https://en.wikipedia.org/wiki/Decision_tree_learning) is one of the most widely used types of algorithms in data analysis.\n","- It is based on the recursive division of the input set of examples into as homogeneous subsets as possible (in terms of belonging to individual classes according to the labeling of training examples).\n","- The division of the \"parent\" takes place on a parameter that maximizes the homogeneity of the \"offspring.\"\n","- This recursive division according to the values ​​of individual features corresponds to the decision (i.e., non-leaf) nodes of the tree.\n","- The leaves, on the other hand, typically contain data that most likely belong to one particular class (note: in this exercise, for simplicity, we only deal with classification trees, not regression trees that associate the individual examples with numerical values ​​on a continuous scale).\n","- The resulting structure is then used to classify data for which we do not know the individual classes.\n","\n","__A sample decision tree:__\n","\n","![decision_tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/decision_tree.jpg)"]},{"cell_type":"markdown","metadata":{"id":"KWEmKZ53nmDJ"},"source":["### Rules for dividing the datasets when creating a decision tree\n","- There are a number of metrics for this (see for instance [here](https://en.wikipedia.org/wiki/Decision_tree_learning#Metrics)).\n","- We will deal with a specific one, the so-called [information gain](https://en.wikipedia.org/wiki/Information_gain_in_decision_trees):\n"," - Based on [entropy](https://en.wikipedia.org/wiki/Entropy_%28information_theory%29).\n"," - To divide the \"parent set\", we use the feature that maximizes the difference between the entropy of the parent and the weighted average entropy of the individual \"offspring\" (the weights correspond to the relative number of examples in each offspring)."]},{"cell_type":"markdown","metadata":{"id":"2IEVtzJdnmDJ"},"source":["#### Practical example - division of the root node (i.e., the whole data set) of fans and non-fans of black metal\n","\n","Original dataset:\n","\n","| __ID__ || __HL__ | __FHT__ | __LN__ | __LS__ || __FBM__ |\n","|--------||--------|---------|--------|--------||---------|\n","| Petr || l | br | 9/10 | 1/10 || y |\n","| Marie || s | $-$ | 8/10 | 9/10 || y |\n","| Azazel || l | ms | 1/10 | 10/10 || n |\n","| Pavel || b | br | 7/10 | 2/10 || y |\n","\n","Data set after transformation:\n","\n","| __ID__ || 0 | 1 | 2 | 3 || y |\n","|--------||-----|-----|-----|-----||---|\n","| 0 || 1 | 1 | 1 | 0.1 || 1 |\n","| 1 || 0.3 | 0 | 0.9 | 0.9 || 1 |\n","| 2 || 1 | 0.5 | 0.1 | 1 || 0 |\n","| 3 || 0 | 1 | 0.8 | 0.2 || 1 |\n","\n","Parent set entropy ($P$):\n","- $I_E(P) = I_E([3,1]) = -\\frac{3}{4}\\log_2\\frac{3}{4} - \\frac{1}{4}\\log_2\\frac{1}{4} \\doteq 0.811$\n","\n","Information gain of the feature 0 (hair length):\n","- The values of the features are normalized, so we can simplify the problem to a binary division to \"halves,\" i.e., divide the examples into subsets according to the values of the attribute $ 0 $ less than $ 0.5 $, or greater than or equal to $ 0.5 $.\n","- Entropy of offspring:\n"," - $I_E(P_{v(0) < 0.5}) = I_E([2,0]) = 0$ (both examples with a normalized hair length value less than 0.5, i.e., Marie and Pavel, are positive, so there is no degree of uncertainty in the set and the entropy is thus zero by definition)\n"," - $I_E(P_{v(0) \\geq 0.5}) = I_E([1,1]) = -\\frac{1}{2}\\log_2\\frac{1}{2} - \\frac{1}{2}\\log_2\\frac{1}{2} = 1$ (Petr is a positive example, Azazel is negative, it's \"one on one\", so the entropy is maximal)\n","- Weighted mean entropy of offspring:\n"," - $wmean(I_E(P_{v(0) < 0.5}), I_E(P_{v(0) \\geq 0.5})) = 0\\cdot\\frac{2}{4} + 1\\cdot\\frac{2}{4} = 0.5$\n","- Information gain:\n"," - $I_E(P) - wmean(I_E(P_{v(0) < 0.5}), I_E(P_{v(0) \\geq 0.5})) \\doteq 0.311$\n"," \n","Similarly for features $ 1,2,3 $ (note: does anyone already know who \"wins\"?) ..."]},{"cell_type":"markdown","metadata":{"id":"rq78sSnGnmDK"},"source":["### __Entropy calculation for training a decision tree__\n","\n","- Let's implement a function for calculating the entropy of a set of examples based on their individual class labels.\n","- Using the data structures defined above for the \"black metal\" dataset."]},{"cell_type":"markdown","metadata":{"id":"UVz53o_JnmDL"},"source":["#### __Possible way of entropy calculation for training a decision tree__"]},{"cell_type":"code","metadata":{"scrolled":false,"id":"ygR-ZLV2nmDM"},"source":["from math import log\n","\n","# function returning the entropy of the proportions of positive and negative \n","# examples in a given data subset\n","\n","def entropy(example_ids):\n"," \n"," # determining the number of positive and negative examples in a set\n","\n"," n_pos, n_neg = 0, 0\n"," for example_id in example_ids:\n"," if labels[example_id] == 1:\n"," n_pos += 1\n"," else:\n"," n_neg += 1\n","\n"," # total number of examples\n"," \n"," n_tot = n_pos+n_neg\n"," \n"," # computing the entropy\n"," \n"," if n_pos == 0 or n_neg == 0:\n"," return 0\n"," return -(n_pos/n_tot)*log(n_pos/n_tot,2) - (n_neg/n_tot)*log(n_neg/n_tot,2)\n","\n","# random implementation test - Petr (0) and Marie (1) belong to the same class, \n","# but Marie and Azazel (2) do not\n","assert entropy([0,1]) == 0\n","assert entropy([1,2]) == 1"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BwVbFHxynmDQ"},"source":["### __Exercise 3.1: Computing all subsets for optimal dataset distribution__\n","\n","- Use the entropy calculation function to implement a function that computes the information gain of possible distributions of the input data set.\n","- When implementing, use the procedure from the example above (including the simplifications, such as splitting the set to \"halves\" according to the values of the given property)."]},{"cell_type":"code","metadata":{"scrolled":false,"id":"RtpKdbJktzd7"},"source":["# computing subsets for optimal distribution of a set according to the \n","# information gain of individual properties\n","\n","def get_splits(example_ids):\n","\n"," # distributions and their information gain - initialization\n"," \n"," splits = {\n"," 0 : {'splits' : [], 'ig' : 0}, \n"," 1 : {'splits' : [], 'ig' : 0},\n"," 2 : {'splits' : [], 'ig' : 0},\n"," 3 : {'splits' : [], 'ig' : 0}\n"," }\n","\n"," # TODO - COMPLETE THE REST YOURSELF\n"," \n"," return splits\n","\n","# listing the distribution of the entire dataset (tree root node); sorted in a \n","# descending order by information gain\n","splits = get_splits([0,1,2,3])\n","for feature_id in sorted(splits,key=lambda x: splits[x]['ig'],reverse=True):\n"," print('Splitting by feature %d with information gain=%f:' % \\\n"," (feature_id,splits[feature_id]['ig']))\n"," print(' ', splits[feature_id]['splits'])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"1UBYSBLvnmDQ"},"source":["#### __Possible way of computing all subsets for optimal dataset distribution__ "]},{"cell_type":"code","metadata":{"scrolled":false,"id":"Yir1c11wnmDR"},"source":["# computing subsets for optimal distribution of a set according to the \n","# information gain of individual features\n","\n","def get_splits(example_ids):\n","\n"," # distributions and their information gain - initialization\n"," \n"," splits = {\n"," 0 : {'splits' : [], 'ig' : 0}, \n"," 1 : {'splits' : [], 'ig' : 0},\n"," 2 : {'splits' : [], 'ig' : 0},\n"," 3 : {'splits' : [], 'ig' : 0}\n"," }\n","\n"," # parent entropy\n"," \n"," entropy_parent = entropy(example_ids)\n","\n"," # iterating through all features and computing specific information gain \n"," # values\n"," \n"," for feature_id in range(4):\n"," child_1, child_2 = [], []\n"," \n"," # divided into two subsets according to the value of the feature\n"," \n"," for example_id in example_ids:\n"," if features[(example_id,feature_id)] < 0.5: # feature value < 0.5\n"," child_1.append(example_id)\n"," else: # feature value >= 0.5\n"," child_2.append(example_id)\n","\n"," # weighted mean of the subset entropy\n"," \n"," weighted_entropy = (len(child_1)/len(example_ids))*entropy(child_1) + \\\n"," (len(child_2)/len(example_ids))*entropy(child_2)\n"," \n"," # updating possible splits\n"," \n"," splits[feature_id]['splits'] = [child_1, child_2]\n"," splits[feature_id]['ig'] = entropy_parent - weighted_entropy\n"," \n"," return splits\n","\n","# listing the distribution of the entire dataset (tree root node); sorted in a \n","# descending order by information gain\n","splits = get_splits([0,1,2,3])\n","for feature_id in sorted(splits,key=lambda x: splits[x]['ig'],reverse=True):\n"," print('Splitting by feature %d with information gain=%f:' % \\\n"," (feature_id,splits[feature_id]['ig']))\n"," print(' ', splits[feature_id]['splits'])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3ksfVo02nmDU"},"source":["### __Exercise 3.2: Learning the actual decision tree__\n","\n","- Implement a decision tree learning function that:\n"," - Recursively calls the function for computing the information gain of possible distributions of the input data set.\n"," - Selects the optimal distribution according to the most discriminating features.\n"," - Stores information about the parents of individual nodes as a representation of the resulting tree.\n","- Notes on the solution:\n"," - Feel free to implement the tree simply as a dictionary mapping nodes to their parents.\n"," - The non-leaf nodes can be strings representing what feature is the split being made on at that level.\n"," - Similarly, the leaves can simply be strings saying whether they're positive or negative example nodes, and listing the corresponding example IDs.\n"," - For instance, a tree with a root `split_on_feature_X` and two immediate leaves `pos : [0,1], neg : [2,3]` as its children can be represented as a Python dictionary `{'split_on_feature_X' : None, 'pos : [0,1]' : 'split_on_feature_X', 'neg : [2,3]' : 'split_on_feature_X'}`"]},{"cell_type":"code","metadata":{"scrolled":false,"id":"KBa9vlyOuTxm"},"source":["# checking the homogeneity of a data set (important for stopping the recursion)\n","\n","def is_pure(example_ids):\n","\n"," # TODO - COMPLETE YOURSELF\n"," \n"," return False # TODO - POSSIBLE CHANGE IN THE ACTUAL IMPLEMENTATION\n","\n","# recursive splitting of the data set\n","\n","def split_set(example_ids,parent_feature,parent_map,level):\n"," \n"," # TODO - COMPLETE YOURSELF (RETURNS A DICTIONARY MAPPING NODES TO THEIR \n"," # PARENTS)\n"," \n"," return parent_map\n","\n","# learning the tree from the complete input data\n","decision_tree = split_set([0,1,2,3],None,{},0)\n","# printing the learned tree\n","print('Learned decision tree:')\n","print(decision_tree)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ul3iISOUnmDV"},"source":["#### __Possible way of learning the actual decision tree__"]},{"cell_type":"code","metadata":{"scrolled":false,"id":"z57ffddInmDW"},"source":["# checking the homogeneity of a data set (important for stopping the recursion)\n","\n","def is_pure(example_ids):\n"," label_values = set()\n"," for example_id in example_ids:\n"," label_values.add(labels[example_id])\n"," if len(label_values) == 1:\n"," return True\n"," return False\n","\n","# recursive splitting of the data set\n","\n","def split_set(example_ids,parent_feature,parent_map,level):\n"," \n"," # if the current set is homogeneous, it doesn't need to be further divided -\n"," # it's a leaf\n"," \n"," if is_pure(example_ids):\n"," label = labels[example_ids[0]]\n"," leaf_type = None\n"," if label == 1:\n"," leaf_type = 'positive_leaf'\n"," elif label == 0:\n"," leaf_type = 'negative_leaf'\n"," \n"," # creating a unique representation of a leaf node (stores information \n"," # about its type and covered examples)\n"," parent_map[leaf_type+' : '+str(example_ids)] = parent_feature\n"," \n"," return parent_map\n"," \n"," # choosing the optimal split\n"," \n"," ig_max, optimal_split, split_feature = 0, [], None\n"," splits = get_splits(example_ids)\n"," for feature_id in splits:\n"," split = splits[feature_id]['splits']\n"," ig = splits[feature_id]['ig']\n"," if ig > ig_max:\n"," split_feature = feature_id\n"," ig_max = ig\n"," optimal_split = split\n","\n"," # updating the tree and splitting further\n"," split_feature_repr = 'feature-'+str(split_feature)+'_level-'+str(level)\n"," parent_map[split_feature_repr] = parent_feature\n"," for child in optimal_split:\n"," split_set(child,split_feature_repr,parent_map,level+1)\n"," \n"," return parent_map\n","\n","# learning the tree from the complete input data\n","decision_tree = split_set([0,1,2,3],None,{},0)\n","# printing the learned tree\n","print('Learned decision tree:')\n","print(decision_tree)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"hPu6CP-0nmDZ"},"source":["### __Food for thought:__\n","\n","- Is the chosen data representation suitable for learning decision trees?\n","- What might be done differently?\n","- How does the resulting representation of the classifier conceptually differ from the perceptron? For example, is there something it can do much better?"]},{"cell_type":"markdown","metadata":{"id":"lr1fmV_UnmDZ"},"source":["## 4. The exercises in a few lines of code\n","\n","- There are a number of libraries and really powerful machine learning tools available in Python.\n","- One example is [scikit-learn](https://scikit-learn.org/) - a general library implementing de facto all \"classical\" machine learning algorithms.\n","- This library can be used for solving the above exercises in a much more systematic manner, and using much less code, as shown in the following example.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"qKtkFJ-SeQIs"},"source":["### Example of solving the problems from these labs using scikit-learn"]},{"cell_type":"code","metadata":{"id":"Y6BliUDHeY65"},"source":["# import of modules implementing individual models\n","\n","from sklearn.linear_model import Perceptron\n","from sklearn.tree import DecisionTreeClassifier\n","\n","# classifier initialization\n","\n","clf_decision_tree = DecisionTreeClassifier()\n","clf_perceptron = Perceptron(tol=1e-3, random_state=0)\n","\n","# training data set\n","X = []\n","for i in range(4):\n"," row = []\n"," for j in range(4):\n"," row.append(features[(i,j)])\n"," X.append(row)\n","# example labels\n","Y = [labels[i] for i in range(4)]\n","\n","# trainig (or also \"fitting\") the classifiers\n","\n","clf_perceptron.fit(X,Y)\n","clf_decision_tree.fit(X,Y)\n","\n","# classification of two unknown examples by the trained classifiers\n","\n","unknowns = [\n"," [0, 0.25, 0, 0.9], # a bald guy with a goatie who can do without nature \n"," # and loves Satan\n"," [1, 1, 1, 0.5] # a \"lumberjack\" who loves nature and has sort of mixed \n"," # feelings about Satan\n","]\n","\n","results_perceptron = clf_perceptron.predict(unknowns)\n","results_decision_tree = clf_decision_tree.predict(unknowns)\n","\n","print('Labeling unknown examples with a perceptron :', results_perceptron)\n","print('Labeling unknown examples with a decision tree:', results_decision_tree)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"collapsed":true,"id":"yLjsBHRLnmDa"},"source":["---\n","\n","#### _Final note_ - the materials used in this notebook are original works credited and licensed as follows:\n","- Sources for the picture illustrating the two faces of black metal:\n"," - Retrieved from Wikimedia Commons [here](https://commons.wikimedia.org/wiki/File:Gorgoroth_I.jpg) and [here](https://en.wikipedia.org/wiki/File:Celestial_Lineage_2011.jpg)\n"," - Authors: [Alina Sofia](https://www.flickr.com/people/43529122@N08) and [Alison Scarpulla](https://commons.wikimedia.org/w/index.php?title=User:Kingofcupsqueenofswords&action=edit&redlink=1)\n"," - License: [Creative Commons Attribution-Share Alike 2.0 (CC BY-SA 2.0)](https://creativecommons.org/licenses/by-sa/2.0/deed.en) and [Creative Commons Attribution-Share Alike 4.0 International ](https://creativecommons.org/licenses/by-sa/4.0/deed.en)\n","- Picture of the general perceptron:\n"," - Retrieved from [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Perceptron.svg)\n"," - Author: [Mat the w](https://en.wikipedia.org/wiki/User:Mat_the_w)\n"," - License: [Creative Commons Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en)\n","- Picture of the decision tree:\n"," - Retrieved from [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Decision_Tree.jpg)\n"," - Author: [Gilgold](https://commons.wikimedia.org/w/index.php?title=User:Gilgoldm&action=edit&redlink=1)\n"," - License: [Creative Commons Attribution-Share Alike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/deed.en)"]}]}