{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare some real-world data: download data file mda.zip from IS (sources/mda.zip). The data corresponds to an experiment in oncology (breast cancer), in which tens of thousands of genes were profiled and a biomarker for \"pathologic complete response\" was sought. Some details at: https://doi.org/10.1186/bcr2468" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "Xtr = np.load('X-train.npy') # 22283 variables, 130 observations\n", "Ytr = np.load('Y-train.npy') # Ytr[:,0] - ER positive; Ytr[:,1] - pCR\n", "Ytr = Ytr.astype('int32') # make sure the labels are INTs\n", "\n", "Xts = np.load('X-test.npy') # 22283 variables, 100 observations\n", "Yts = np.load('Y-test.npy') # Ytr[:,0] - ER positive; Ytr[:,1] - pCR\n", "Yts = Yts.astype('int32') # make sure the labels are INTs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# AdaBoost and Random Forests classifiers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AdaBoost\n", "Read the docs: http://scikit-learn.org/stable/modules/ensemble.html and have a look at the examples:\n", " * http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_hastie_10_2.html\n", " * http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_twoclass.html" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.ensemble import AdaBoostClassifier\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.metrics import zero_one_loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Classical AdaBoost: discrete AdaBoost algorithm. Weak learner: decision stumps." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "T=200\n", "bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=T,\n", " algorithm='SAMME')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AdaBoostClassifier(algorithm='SAMME',\n", " base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n", " splitter='best'),\n", " learning_rate=1.0, n_estimators=200, random_state=None)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# fit the model (as usual)\n", "bdt.fit(Xtr, Ytr[:,0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result of AdaBoost with decision stumps can be analyzed to find the most important variables from the data set. The follwoing command gives the indexes of variables with importance score higher than a threshold (0.01):" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 1, 1262, 1952, 2320, 2584, 5830, 5921, 6915, 7681,\n", " 7746, 8076, 9810, 10643, 12014, 12172, 12820, 12893, 13199,\n", " 14522, 16159, 19896, 22097]),)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(bdt.feature_importances_ > 0.01)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the errors, per step:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# - train error\n", "err_tr = np.zeros((T,)) \n", "for i, yp in enumerate(bdt.staged_predict(Xtr)):\n", " err_tr[i] = zero_one_loss(yp, Ytr[:,0])\n", "\n", "# - test error\n", "err_ts = np.zeros((T,)) \n", "for i, yp in enumerate(bdt.staged_predict(Xts)):\n", " err_ts[i] = zero_one_loss(yp, Yts[:,0])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHBxJREFUeJzt3XmYVOWZ9/HvDQjK4hJpAwJi94AaMIliSxzXqBCBLGCc\nJLjEDYeLzAsRvUxiJhONSWai4/iOr0YlEHGNIWqGhEkgmphEHR2RBtGwBEQawyotbjQYoOn7/eM5\nZVU3vVR1H/qcPvl9rquurjr11Om7T1X/6qnnnPOUuTsiIpJdXZIuQERE9i8FvYhIxinoRUQyTkEv\nIpJxCnoRkYxT0IuIZFxRQW9mY8xslZmtMbPrm7j/k2b2rpktjS43xF+qiIi0RbfWGphZV+AuYDSw\nAVhkZvPcfUWjps+6+2f2Q40iItIOxfToRwJr3H2tu+8G5gDj929ZIiISl1Z79MAAYH3B7Q3AJ5po\nd6qZvQJsBK5z9+WNG5jZZGAyQK9evU467rjjSq9YRORv2OLFi99097JSHlNM0BdjCXCUu9ea2Tjg\nF8DQxo3cfSYwE6CystKrqqpi+vUiIn8bzOz1Uh9TzNDNRmBQwe2B0bIPuPt77l4bXZ8PHGBmfUst\nRkRE4ldM0C8ChppZuZl1ByYC8wobmFk/M7Po+shovdviLlZERErX6tCNu9eZ2VTgCaArMNvdl5vZ\nlOj+GcA/AF8xszrgfWCia1pMEZFUsKTyWGP0IiKlM7PF7l5ZymN0ZqyISMYp6EVEMk5BLyKScQp6\nEZGMU9CLiGScgl5EJOMU9CIiGaegFxHJOAW9iEjGKehFRDJOQS8iknEKehGRjFPQi4hknIJeRCTj\nFPQiIhmnoBcRyTgFvYhIxinoRUQyTkEvIpJxCnoRkYxT0IuIZJyCXkQk4xT0IiIZp6AXEck4Bb2I\nSMYp6EVEMk5BLyKScQp6EZGMU9CLiGScgl5EJOMU9CIiGaegFxHJOAW9iEjGKehFRDKuqKA3szFm\ntsrM1pjZ9S20O9nM6szsH+IrUURE2qPVoDezrsBdwFhgGHChmQ1rpt0twJNxFykiIm1XTI9+JLDG\n3de6+25gDjC+iXbTgJ8DW2OsT0RE2qmYoB8ArC+4vSFa9gEzGwCcD9zT0orMbLKZVZlZVU1NTam1\niohIG8S1M/Z24BvuXt9SI3ef6e6V7l5ZVlYW068WEZGWdCuizUZgUMHtgdGyQpXAHDMD6AuMM7M6\nd/9FLFWKiEibFRP0i4ChZlZOCPiJwEWFDdy9PHfdzO4HfqWQFxFJh1aD3t3rzGwq8ATQFZjt7svN\nbEp0/4z9XKOIiLRDMT163H0+ML/RsiYD3t0vb39ZIiISF50ZKyKScQp6EZGMU9CLiGScgl5EJOMU\n9CIiGaegFxHJOAW9iEjGKehFRDJOQS8iknEKehGRjFPQi4hknIJeRCTjFPQiIhmnoBcRyTgFvYhI\nxinoRUQyTkEvIpJxCnoRkYxT0IuIZJyCXkQk4xT0IiIZp6AXEck4Bb2ISMYp6EVEMk5BLyKScQp6\nEZGMU9CLiGScgl5EJOMU9CIiGaegFxHJOAW9iEjGKehFRDJOQS8iknFFBb2ZjTGzVWa2xsyub+L+\n8Wb2ipktNbMqMzs9/lJFRKQturXWwMy6AncBo4ENwCIzm+fuKwqaPQXMc3c3s48BjwLH7Y+CRUSk\nNMX06EcCa9x9rbvvBuYA4wsbuHutu3t0sxfgiIhIKhQT9AOA9QW3N0TLGjCz883sz8CvgSvjKU9E\nRNortp2x7j7X3Y8DJgDfa6qNmU2OxvCrampq4vrVIiLSgmKCfiMwqOD2wGhZk9z9GaDCzPo2cd9M\nd69098qysrKSixURkdIVE/SLgKFmVm5m3YGJwLzCBmY2xMwsuj4C6AFsi7tYEREpXatH3bh7nZlN\nBZ4AugKz3X25mU2J7p8BXABcamZ7gPeBLxXsnBURkQRZUnlcWVnpVVVVifxuEZHOyswWu3tlKY/R\nmbEiIhmnoBcRyTgFvYhIxinoRUQyTkEvIpJxCnoRkYxT0IuIZJyCXkQk4xT0IiIZp6AXEck4Bb2I\nSMYp6EVEMk5BLyKScQp6EZGMU9CLiGScgl5EJOMU9CIiGaegFxHJOAW9iEjGdY6g/8Uv4LXXkq5C\nRKRT6hxBf/HFcOedSVchItIppT/o6+pg5054552kKxER6ZTSH/S1teHne+8lW4eISCeloBcRybj0\nB/327eGngl5EpE3SH/Tq0YuItEvnCfp33022DhGRTir9Qa+hGxGRdkl/0Od69Dt3hkMtRUSkJOkP\n+lyPvvF1EREpSvqDPtejB43Ti4i0QfqDvrAXr3F6EZGSpT/oC3v0CnoRkZKlP+jVoxcRaZeigt7M\nxpjZKjNbY2bXN3H/xWb2ipn9ycyeN7OPx1ahxuhFRNql1aA3s67AXcBYYBhwoZkNa9SsGjjL3T8K\nfA+YGVuFtbVwxBHhunr0IiIlK6ZHPxJY4+5r3X03MAcYX9jA3Z9397ejmy8AA2OrcPt2OPLIcF1B\nLyJSsmKCfgCwvuD2hmhZcyYBC5q6w8wmm1mVmVXV1NQUV2FtLfTrB2YKehGRNoh1Z6yZnU0I+m80\ndb+7z3T3SnevLCsrK26l27dDnz5w8MEKehGRNuhWRJuNwKCC2wOjZQ2Y2ceAHwNj3X1bPOURevR9\n+sAhh2hnrIhIGxTTo18EDDWzcjPrDkwE5hU2MLOjgP8Cvuzuq2OtcPt26N0736PfuRO2xfc+IiKS\nda0GvbvXAVOBJ4CVwKPuvtzMppjZlKjZDcDhwN1mttTMqmKpzj3fo88F/fTpcM45saxeRORvQTFD\nN7j7fGB+o2UzCq5fBVwVb2nAX/8Ke/fme/TbtsGzz8Krr4aZLLsVVb6IyN+0dJ8ZmztZqnfvMEa/\naROsWhXCf/36lh8rIiJAZwn63NDNxo1hOAegujq5ukREOpF0B31unpvc0E2htAT9m2/CccfBwoVJ\nVyIi0qR0B33jHj3A4YdD167pCfolS8Jw0n/+Z9KViIg0Kd1B31SPvrISBg1KT9Dn6pg7N/TuRURS\nJt1BX9ijP+SQcH3ECKiogLVrk6urUHV1mJ5h9254+OGkqxER2Ue6g76pHv2IEVBenq4e/d/9HYwc\nCTffDJ/7HPz610lXJSLygXQHfeHhlX//9zBhApx7bgj6N94IZ8kmrbo61HPDDTBwIDz1FMyMb5Zm\nEZH2SnfQ53r0ffqEqYrnzoXDDgvBCrBuXWKlfSAX9J/+NFRVwemnw+bNSVclIvKBdAd9bW04wqZH\nj4bLc0Gf9PBNbW3YAZurB6B/fwW9iKRKuoM+N0WxWcPlFRXh50MPwQMPQH19x9cG+TeaXD0Qgn7L\nluRqEhFpJN1Bv3x5+NKRxo44AgYPhp/9DC6/HH71qw4vDcgf+dO4R19XB2+9lUxNIiKNpDfoq6vh\n97+Hiy7a9z4zWL0atm4NbwT33tvx9UG+R9846EHDNyKSGukN+vvuC4F++eVN39+9O5SVwWWXhcMZ\nN23q0PKAEPS9e4ezdXMU9CKSMukM+r17Q9CPGRPOgm3JpEmh/QMPNFy+axfs2BGuuzf8spLNm8Ow\nUO6SO4yzJTt2hGmTc/WtXAnLloXefOE+hNxQk4JeRFIinUH//POwYQNccUXrbYcOhTPPDMM3uZkt\nIbwBnHFGWPbgg6GnvXp16PlXVMDxx+cv48a1/Dvc4ayz4ItfDLf/7d9g2LAwtHTMMQ3bqkcvIimT\nzm/uWLUq/Dz55OLaX3UVXHopPP00fPKTYdkzz4Q56194Ae66C/bsCW8Ghx4aeuY//nE42/YPf4B7\n7gk9++HDm17/iy/C4sVhArO1a8MJUaedBldfHX4W6tUrHCmkoBeRlEhnj37t2nD8/MCBxbW/4IIQ\n2rmdsjU1+S8mue46WLQIevaE++8Pbc46K/T4v/AFuOkmOOCAlnfo3nsvHHhguH7ppeHTxjXXhMcf\neeS+7XUsvYikSDqDvro6HD5Z7FcF9uwJF18Mjz8O77wDL70Ulh9zTBgG6t4d7r47HKXz2mvhE0BO\nWRmMHx+Oyd+1a99119bCT38KEyfC6NHw3HPhMZ/9bPP1KOhFJEXSG/SFhywWY9KkMCTzyCP5oM/N\nEX/++eGN4MgjwyyYF1zQ8LFXXRXOcB04MIR04WXw4BD2V12Vf4O49NLw5tGc/RX0d9wBU6a03k5E\npEA6x+irq8MskKUYMQJOOCGMvQ8dGt4oxo6F224Lve9u3WD2bHj/fTjooIaPHT06TEq2ZUvT6x40\nCE49NZwI9b3vwT/+Y8u15ILefd+zetvjoYfglVfgzjvDcJOISBHSF/Q7doQhllJ79GahVz9tGrz6\nKpx3Xlh27bX5Nued1/Rju3QJY/WtOeAA+Jd/ab1dv35hZs3t2/f9CsS22rMnhPzu3bBiBXz84/Gs\nV0QyL31DN7kZKUsNegjDMz16hKGWE0+MtayS5A6xbO4TQlusWBFCHsLRPyIiRUq2Rz9tGrz9dv72\nCSeEL9qGtgX9YYeF8fdHHglDOUkpPJa+8XH2zz0HM2Y0POa/sd694dZbw2GaObn9DmYh6Is5x6Cj\nzZwJQ4bAOee03vbtt+EHP4Bvf7vh3xmH+vrwCe2SS8Iw3uOPh+3WeN9MW82aFc7FOPfchsvffRe+\n/vX8iXpmMHlyOJ+jWAsWwE9+Ej5lXnNN6R2WvXvhO98Jr4+KijAfVPfuYT+VdLwnnwzn8bTXJZeE\nE0jbKNmgf+mlfK+3tja8wKdPD7cLZ4Qsxde+FgL21FPjqbEtcm9Sq1eHQzkLfe1r8PLL+TeDxtzD\n4aXHHw9Tp+aXL1kS3gA++tF86KfJ5s3wT/8Exx4bzhhubd/EPfeEN7MBA8L5CHH63e/gu98N23HW\nrBC2ZvCZz+w75XWptmwJf+eQIeFTVuHfOXt2eLOrqAjLt2wJr4GFC4tbd319eM7ffDN8envrrdIn\n7PvNb+D734eNG+GHPwx/e48e4fsSWjqAQOLnHjqzmzeHiRjbY9So9tbiiVxOOukkb2D9evcuXdx7\n9HDv2dO9vt47rb173Q8+2P0rX2m4fNkyd3C/7baWH3/SSe4f+1jDbXDaae6nn+4+bZp7r17udXXx\n190eP/hB+NvA/fnnW267d697RUVoe/zx8T/XX/hCWPeBB7rffXe+rjlz2r/uW27Jr+9//ie/vL7e\nfdgw9098Ir/s9ttDu5dfLm7dTz0V2j/8sPs//3P4f9iwobT6JkwI6+jZ0/3OO/O1Pv54aeuR9nvm\nmbDt77sv1tUCVV5i3qYn6N3dx40LJQ0fHtc2Sc5ZZzX8p3d3v+Ya9wMOcN+6teXH5sJp0aJwu64u\nhPu0aeFFA+4rV+6Pqtumvt59yBD3k08OdU6a1HL7XKCdfXb4uXBhfLVs3Rq2cW7dBx7oXl7uPniw\n+6hR7Vt3fb37Mce4V1a69+7tfvnl+fv+93/D75s1K7/szTfdu3d3/+pXi1v/RRe5H3qo+86d7mvW\nhPV9//vF17dli3u3bg3/9iFD3AcOdB8zpvj1SDwuu8y9Tx/32tpYV9uWoE/XUTeTJsH8+W0bn0+b\nESPCWHxdXTi0c9eucHjk+PHhhKuWXHhhOFroppvCuPK2bWHcd8SI/L6HJUvC/ozXXw/3DRu273qe\nfTaM8fbu3XD5zp3hbOHGw0oQhl2qqkr7WzdsgDVrwljkH/8Ic+aEqSGaG755+OEwFcWcOeG5vumm\ncJZxHJ57LhyhdMcd8OUvw9KlcOWVYVjkxhvDiXM9e7Zt3Rs3hqGY++4Lv+eRR8L4e5cu8NhjYfqL\nL30p3/7ww8P3HD/8cNj/1NJw1t698POfh3M1DjoofOH82WeHoacBA4qr75lnwuvt7rvD9ly2LPxP\n7dgB//qv4fWYO8Nb9q/6+vCauOSS8LpIWqnvDHFdmuzR79oVel833hjTe1+CHnww9KqWLQu3H300\n3F6woLjHT5qU/9gN4WP86tXuu3eH3uSVV4Z2I0e6f/jDYXmhFSvC4669dt91f/Ob4b6XXmq4vK7O\nfdCghr+32EtZmfuOHe4vvuhu1nr7XF2TJ7ft97V0OeOMsO577w2fMNavd//LX0IPt73rPvzw0EOr\nqtr375wyZd9t/cc/Fr/uLl0aDvM89ljp9Z1zTnjsj34UXiebNrlXV4dPFnFvZ11avpi5L15c3P97\nCWhDj97C4zpeZWWlVzXVc/zrX8NOoy7pO/KzJMuXhx2qDz4YepZjxoSdd9XVYR6f1uzdm5+vB0Kv\nvG/fcH3SpHA0xW9/m9/pPHdu6D3mXHddOFmsb9/Q487thKyrCyeAbdkSdvzdeWf+MQsWhJk8Z80q\nfefPhz6UP2dg69bwqaE5ZqGGLl32/Tvj0K9f6Lm6h95s7hPNW2/Be++1b92Ff2dNTf4IGwh/U1PP\n7RtvhBP1WtOr176f9jZtyh9WW4z+/cNz7R6eg1xvMo6/XUrT1PMZAzNb7O6VJT2o1HeGuC5N9uiz\nZM8e94MOcp8+3X3duvDufsMN8az7+edDj2Hw4NBTO+II909/On//rl2hh53rnT/6aP6+X/4yLBs0\nyP2ww9zffz9/3+c/7963b3i8iKQSbejRd/Juc4p16xbOXl24MIyZQnzHvp9yCnzkI2F8fsKEMK67\nYEE47HL9+nCYak1N+L2DBsGPfhSWr18frvfrF3rtb78dZvRcvz6M586b1/o8PiLS+ZT6zhDXJfM9\nenf3qVPz43WjR8e77ttuC+t98sn8ERqFlwEDwpj7jTfue983vhEOcSwv3/e+5cvjrVNEYkWnP+om\na7797XCUTH09fOpT8a576tTwRSmjRoUx7yefhL/8JX//ySeH8eLrrgsn8OzZE5Z36xbOkuzSBf77\nv8MXs+QMGND00Tsi0qkVtTPWzMYA/w/oCvzY3W9udP9xwH3ACOBb7v4fra2z2Z2xIiLSrLbsjG21\nR29mXYG7gNHABmCRmc1z9xUFzd4CvgpMaGIVIiKSoGJ2xo4E1rj7WnffDcwBxhc2cPet7r4I2LMf\nahQRkXYoJugHAIUHOm+IlpXMzCabWZWZVdXU1LRlFSIiUqIOPbzS3We6e6W7V5bthxMJRERkX8UE\n/UZgUMHtgdEyERHpBIoJ+kXAUDMrN7PuwERg3v4tS0RE4tLqUTfuXmdmU4EnCIdXznb35WY2Jbp/\nhpn1A6qAg4F6M5sODHN3Ta4hIpKwok6Ycvf5wPxGy2YUXN9CGNIREZGU0Vw3IiIZp6AXEck4Bb2I\nSMYp6EVEMk5BLyKScQp6EZGMU9CLiGScgl5EJOMU9CIiGaegFxHJOAW9iEjGKehFRDJOQS8iknEK\nehGRjFPQi4hknIJeRCTjEg36ykoYOBCOPRaqq5OsREQkuxIN+rPPhtNOg9Wr4YUXkqxERCS7ivoq\nwf3l1lth50549FH16EVE9pfEx+h79oQjjoB165KuREQkmxIPeoDycvXoRUT2l1QE/dFHq0cvIrK/\npCboX38d6uuTrkREJHtSEfTl5bBnD2zalHQlIiLZk4qgP/ro8FPDNyIi8UtF0JeXh5/V1eCebC0i\nIlmTiqA/6qjwc9UqGD4cbr452XpERLIkFUF/4IHQvz/88IewciX8/vdJVyQikh2pCHoIwzfvvhuu\nL1+ebC0iIlmSmqDP7ZAdMyYcffPOO4mWIyKSGYnOdVPoiiugogJOOQV+85vQqz/ttKSrEhHp/FIT\n9KNGhUvuEEsFvYhIPFIzdJNz1FHQq5fG6UVE4lJU0JvZGDNbZWZrzOz6Ju43M7sjuv8VMxvR5oK6\nwLBhCnoRkbi0GvRm1hW4CxgLDAMuNLNhjZqNBYZGl8nAPe0pavhwWLasPWsQEZGcYsboRwJr3H0t\ngJnNAcYDKwrajAcedHcHXjCzQ82sv7tvbktRw4fD/ffDU09B795tWYOISHYMHgz9+rX98cUE/QBg\nfcHtDcAnimgzAGhT0J94Yvg5alRbHi0iki233w5XX932x3foUTdmNpkwtMNRuXkPmnDOOfD007Bj\nR0dVJiKSXh/5SPseX0zQbwQGFdweGC0rtQ3uPhOYCVBZWdns9GVmcOaZRVQmIiKtKuaom0XAUDMr\nN7PuwERgXqM284BLo6NvTgHebev4vIiIxKvVHr2715nZVOAJoCsw292Xm9mU6P4ZwHxgHLAG2Alc\nsf9KFhGRUpgnNAG8mW0HViXyy0vTF3gz6SJa0RlqhM5Rp2qMT2eoszPUCA3rHOzuZaU8OMkpEFa5\ne2WCv78oZlaV9jo7Q43QOepUjfHpDHV2hhqh/XWmbgoEERGJl4JeRCTjkgz6mQn+7lJ0hjo7Q43Q\nOepUjfHpDHV2hhqhnXUmtjNWREQ6hoZuREQyTkEvIpJxiQR9a/PbJ8HMBpnZH8xshZktN7Oro+Xf\nMbONZrY0uoxLQa3rzOxPUT1V0bIPmdlvzezV6OdhCdZ3bMH2Wmpm75nZ9DRsSzObbWZbzWxZwbJm\nt52ZfTN6na4ys/MSrPFWM/tz9H0Pc83s0Gj50Wb2fsE2nZFgjc0+v0lsxxbq/FlBjevMbGm0PKlt\n2Vz2xPe6dPcOvRDOrn0NqAC6Ay8Dwzq6jibq6g+MiK73AVYT5t//DnBd0vU1qnUd0LfRsn8Hro+u\nXw/cknSdBc/3FmBwGrYlcCYwAljW2raLnv+XgR5AefS67ZpQjZ8CukXXbymo8ejCdglvxyaf36S2\nY3N1Nrr/NuCGhLdlc9kT2+syiR79B/Pbu/tuIDe/faLcfbO7L4mubwdWEqZa7izGAw9E1x8AJiRY\nS6Fzgdfc/fWkCwFw92eAtxotbm7bjQfmuPsud68mTPExMoka3f1Jd6+Lbr5AmDgwMc1sx+Yksh2h\n5TrNzIAvAj/tiFqa00L2xPa6TCLom5u7PjXM7GjgRGBhtGha9JF5dpJDIgUc+J2ZLY6mfgb4sOcn\nktsCfDiZ0vYxkYb/SGnbltD8tkvra/VKYEHB7fJoqOFpMzsjqaIiTT2/ad2OZwBvuPurBcsS3ZaN\nsie216V2xjZiZr2BnwPT3f09wtciVgAnEL5I5bYEy8s53d1PIHyF4/8xswaTOnv4fJf4cbMWZjv9\nHPBYtCiN27KBtGy75pjZt4A64CfRos3AUdHr4VrgETM7OKHyUv/8NnIhDTshiW7LJrLnA+19XSYR\n9EXNXZ8EMzuAsKF/4u7/BeDub7j7XnevB2bRQR85W+LuG6OfW4G5hJreMLP+ANHPrclV+IGxwBJ3\nfwPSuS0jzW27VL1Wzexy4DPAxdE/PtHH923R9cWE8dpjkqivhec3VdsRwMy6AZ8HfpZbluS2bCp7\niPF1mUTQFzO/fYeLxuvuBVa6+/8tWN6/oNn5QKJfW25mvcysT+46YSfdMsI2vCxqdhnwy2QqbKBB\njylt27JAc9tuHjDRzHqYWTkwFHgxgfowszHA14HPufvOguVlZtY1ul4R1bg2oRqbe35Tsx0LjAL+\n7O4bcguS2pbNZQ9xvi47eg9z1BEZR9iz/BrwrSRqaKKm0wkfjV4BlkaXccBDwJ+i5fOA/gnXWUHY\n4/4ysDy3/YDDgaeAV4HfAR9KuM5ewDbgkIJliW9LwhvPZmAPYWxzUkvbDvhW9DpdBYxNsMY1hHHZ\n3GtzRtT2guh1sBRYAnw2wRqbfX6T2I7N1Rktvx+Y0qhtUtuyueyJ7XWpKRBERDJOO2NFRDJOQS8i\nknEKehGRjFPQi4hknIJeRCTjFPQiIhmnoBcRybj/D2ottIK3bgmgAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plt.figure()\n", "ax = plt.subplot(111)\n", "ax.set_ylim(-0.01, 0.5)\n", "ax.set_xlim(0, T+2)\n", "ax.plot(np.arange(T)+1, err_tr, color='blue')\n", "ax.plot(np.arange(T)+1, err_ts, color='red')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TO DO:\n", "Try other base learners: e.g. a decision tree with 2 levels, in the above example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RealAdaboost" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH4hJREFUeJzt3Xl4VPXZ//H3bRAUK+pTolDAEpWqaK3GKLW4r0hFtJvg\nghvyUOvSuj2gj8tVba3a9rG/FkVasFZtqYoK1rXVVlsVS8SlgqIoYkABxTWKQsz9++OecSbJTDIJ\nE2Zy/Lyua67MnGXOPWdOPnPme77njLk7IiKSXOuVugAREelcCnoRkYRT0IuIJJyCXkQk4RT0IiIJ\np6AXEUm4goLezIaZ2QIzW2hmE3KM39fM3jOzp1O3i4pfqoiIdES3tiYwswpgEnAQsASYY2az3H1+\ns0n/6e6HdUKNIiKyFgrZo98dWOjur7j7amA6MLJzyxIRkWJpc48e6AfUZT1eAgzJMd03zOxZYClw\njrvPaz6BmY0DxgFstNFGu2633Xbtr1hE5HPsySeffMvdK9szTyFBX4i5wJbuXm9mw4E7gUHNJ3L3\nKcAUgJqaGq+trS3S4kVEPh/MbHF75ymk6WYpMCDrcf/UsM+4+/vuXp+6fw+wvpn1bm8xIiJSfIUE\n/RxgkJlVmVl3YBQwK3sCM+tjZpa6v3vqeVcWu1gREWm/Nptu3L3BzE4D7gcqgGnuPs/MxqfGTwa+\nA3zfzBqAVcAo12UxRUTKgpUqj9VGLyLSfmb2pLvXtGcenRkrIpJwCnoRkYRT0IuIJJyCXkQk4RT0\nIiIJp6AXEUk4Bb2ISMIp6EVEEk5BLyKScAp6EZGEU9CLiCScgl5EJOEU9CIiCaegFxFJOAW9iEjC\nKehFRBJOQS8iknAKehGRhFPQi4gknIJeRCThFPQiIgmnoBcRSTgFvYhIwinoRUQSTkEvIpJwCnoR\nkYRT0IuIJJyCXkQk4RT0IiIJp6AXEUk4Bb2ISMIp6EVEEk5BLyKScAp6EZGEKyjozWyYmS0ws4Vm\nNqGV6XYzswYz+07xShQRkbXRZtCbWQUwCTgUGAyMNrPBeaa7Anig2EWKiEjHFbJHvzuw0N1fcffV\nwHRgZI7pTgdmACuKWJ+IiKylQoK+H1CX9XhJathnzKwfcCRwbWtPZGbjzKzWzGrffPPN9tYqIiId\nUKyDsVcD/+Puja1N5O5T3L3G3WsqKyuLtGgREWlNtwKmWQoMyHrcPzUsWw0w3cwAegPDzazB3e8s\nSpUiItJhhQT9HGCQmVURAT8KODp7AnevSt83s98Df1HIi4iUhzaD3t0bzOw04H6gApjm7vPMbHxq\n/OROrlFERNZCIXv0uPs9wD3NhuUMeHc/Ye3LEhGRYtGZsSIiCaegFxFJOAW9iEjCKehFRBJOQS8i\nknAKehGRhFPQi4gknIJeRCThFPQiIgmnoBcRSTgFvYhIwinoRUQSTkEvIpJwCnoRkYRT0IuIJJyC\nXkQk4RT0IiIJp6AXEUk4Bb2ISMIp6EVEEk5BLyKScAp6EZGEU9CLiCScgl5EJOEU9CIiCaegFxFJ\nOAW9iEjCKehFRBJOQS8iknAKehGRhFPQi4gknIJeRCThFPQiIglXUNCb2TAzW2BmC81sQo7xI83s\nWTN72sxqzWzP4pcqIiId0a2tCcysApgEHAQsAeaY2Sx3n5812YPALHd3M9sJuAXYrjMKFhGR9ilk\nj353YKG7v+Luq4HpwMjsCdy93t099XAjwBERkbJQSND3A+qyHi9JDWvCzI40sxeAu4GTilOeiIis\nraIdjHX3O9x9O+AI4NJc05jZuFQbfu2bb75ZrEWLiEgrCgn6pcCArMf9U8NycvdHgK3MrHeOcVPc\nvcbdayorK9tdrIiItF8hQT8HGGRmVWbWHRgFzMqewMy2MTNL3a8GegAri12siIi0X5u9bty9wcxO\nA+4HKoBp7j7PzManxk8Gvg2MMbM1wCrgqKyDsyIiUkJWqjyuqanx2trakixbRKSrMrMn3b2mPfPo\nzFgRkYRT0IuIJJyCXkQk4RT0IiIJp6AXEUk4Bb2ISMIp6EVEEk5BLyKScAp6EZGEU9CLiCScgl5E\nJOEU9CIiCaegFxFJOAW9iEjCKehFRBJOQS8iknAKehGRhFPQi4gknIJeRCThFPQiIgmnoBcRSTgF\nvYhIwinoRUQSTkEvIpJwCnoRkYRT0IuIJJyCXkQk4RT0IiIJp6AXEUk4Bb2ISMIp6EVEEk5BLyKS\ncF0n6JctgzFj4IMPSl2JiEiX0nWCfuZMuPFGePTRUlciItKlFBT0ZjbMzBaY2UIzm5Bj/DFm9qyZ\n/cfMHjOzrxW90rlz4++iRUV/ahGRJGsz6M2sApgEHAoMBkab2eBmky0C9nH3rwKXAlOKXaiCXkSk\nYwrZo98dWOjur7j7amA6MDJ7And/zN3fST2cDfQvapVr1sCzz8Z9Bb2ISLsUEvT9gLqsx0tSw/I5\nGbg31wgzG2dmtWZW++abbxZe5fz5sHo1rLeegl5EpJ2KejDWzPYjgv5/co139ynuXuPuNZWVlYU/\n8VNPxd9994VXXlnrOkVEPk8KCfqlwICsx/1Tw5ows52A3wEj3X1lccpLmTsXNtoIDj4Y3nkH3nuv\nqE8vIpJkhQT9HGCQmVWZWXdgFDArewIz2xK4HTjO3V8sepVz58LOO8PWW8djNd+IiBSszaB39wbg\nNOB+4HngFnefZ2bjzWx8arKLgC8C15jZ02ZWW7QK6+sj6HfdFaqqYpiCXkSkYN0Kmcjd7wHuaTZs\nctb9scDY4paWcsstsGoVHHUUbLVVDFPQi4gUrKCgL6nf/Q622w722APMYJNNdEBWRKQdyvsSCPPn\nw+OPw9ixEfIQzTfaoxcRKVh5B/2NN0K3bnDccZlhW20F994LG28MN9xQutpERLqI8g76Z5+FHXeE\nzTfPDDv/fDj7bOjZE2bNyj+viIgA5d5Gv2hRtM9n23XXuL32Gvz736WpS0SkCynfPXp3ePXVTE+b\n5qqrY/w77+QeLyIiQDnv0S9fHt0q033nm9tll/j71FPxYySvvw7f//66q09EpIso36BP96xpK+hr\na+E3v4H334fx4zO9c0REBCjnppt0X/l8QV9ZCQMGwK9/DXV1cf0bdbsUEWmhfIM+HdoDB+afproa\nliyBiop4nL7KpYiIfKa8g75PH9hww/zTpJtv/vu/o799+leoRETkM+Ud9Pl63KQdeGBcvviMM2CH\nHRT0IiI5lHfQ52ufTxs6NA7CbrttNOPMnRvdMkVE5DPlGfRr1sQJUW0FPcTPC0I046xYAW+80bm1\niYh0MeUX9G+/Df/4BzQ2Fhb0adXV8TfdfPPxx/Dhh0UvT0SkqymvoHePPfODD47HX/lK4fN+7Wux\nd/+vf8Xj0aNhr73UlCMin3vlFfRLlkSTzfjxcM890QZfqC98AYYNiyteLloEM2dGd8vZszuvXhGR\nLqC8gj7d7DJmDBx6aPvPch07Ni6FcNxxsSe/4YYwdWrx6xQR6ULKL+jXWw922qlj8x92WFzS+NFH\no+vl0UfD9OlxLRwRkc+p8gv67baLvvEdsf76cPzxcX/sWDj55DggO2NG8WrsTN/7HvzpT6WuQpLg\nvvviEiF9+8Ztyy2jk0Nnu/Za+M534hv1vHnRoSJdQ2u37beHZctyP+eqVbD33nD33S3HPfJIHIvT\nzlyryuuiZk89Bfvtt3bPcc450KsXHHlkBH/v3rGHf8IJRSmx06xYAbfeGpdzGD261NVIV3fVVdDQ\nAIcfHo9nzIBf/hL23bfzltnQAJddFs2n//43/P73Ed5jxrQ937RpcP31MHFiy/EzZsA//xkfHt/8\nZtNxd9wRHTD+/OfYuZPc3L0kt1133dWbWLbMHdx/8QsvqoMOcq+uLu5zdob77ovXP2RIqSuRrm7h\nwtiWLr00M2zCBPeKCvelSztvuXfdFcsF96OPdu/Vy/244wqbd5993Lfe2v3TT3OPSz/v8883Hbf3\n3jH8619f2+q7DKDW25m35dN0k74gWbo/fLFUV8N//gOrVxf3eYstfSBaV+CUtXX99XGsK/tb7Ekn\nwaefdu7vLE+dGsfIjjkG/vjHOGu90L3ssWPh5ZejKSbbSy/Bww/HZU4qKmLPP62xMXKjV6/oXTdv\nXvFeS8KUtunm9NMzvxD14ovxd+edi7uM6uo403bevOij39gYXy+PPhq22Sb3PDNmxEZ1xBHwwgvx\ntfCii3L3Avr4Y7jkEvjRj2CLLaKN8tFHW0530kmw//5Nh33wAfzkJ/E7uOkPuhUr4rhCR49TSGGm\nT49eWSNHdu5yst/jXr3gZz+D556LH7e/8sr4e9llsZ21Zv/9Yxuqq4tt7JJL4kJ+EyfC0qVNp73v\nvuhq3L9/ZtigQbDPPhHGEyYU3qPtww/h3HMjtFvjDnfdFf8HI0fCzTfHMvfaq7DlfPvbcNppccvO\ngJdeig+t886DxYvht7+NpqFDD4Xdd4/1e/nl8f85dWo0T6X94Q/wwAOFLb8U1lsPzjorXu/MmdF0\nm8+xx8Z72kGlDfqnnmp6AGb0aNh00+IuI/uM2V12gYcegosvjgM8l1+ee57zzoP6ehg+PDagW2+N\nts199mk57fTpcMUVcf+ss2LPY7PN4p86bflyeOaZ+LHz7H+wu++OeSsro74ePeCTT2Kvfscdi/Ly\nJYf6ejjllPgwHT48juV0lrvuivd4iy0inCZOjPvLl8NXvxrnilx4YRyQ7Nkz93O8+y7cfnuE4RVX\nwKRJsS1vtll8WPTvH9tOWu/eEc7NjR0bXY8ffrjwtvqbbooPlqqqzOVG8hk8OH7lraoKRo2KXnCF\nfqBsuCFccAFcd13Lc1/OOAP69YvXtGAB3Hsv/OUvcPXVMf6QQ+L/5w9/iP/pHj3iA+DUU+P+ZpsV\nVsO6tnRp5r39wQ9iu+zdO/e0Bx64dstqb1tPsW4t2ug7y6efum+8sfupp8bjo46KNr2DD849/dtv\nZ9oDJ092X3/9uH/ssbmnHzo0xm+xhftPfxr3589vOs1118XwJ55oOvzcc2P4wIHx9/DD4++sWWv3\nmqV106Zl3uM77ujcZZ19dixn8GD3c85x79bNffly9112idvpp7v36OG+cmX+55gzJ57jl79032ST\nzPZ71FHum23mvmpVYbV89FHMn29bzqWmxn2nndwbGwufp7P9/e+xDqqq4v/zk08yx7huuSWm+d3v\n4vGjj5a01FZNmOC+3nruU6ZErTNmFDQbHWijT37Qu8cBmz32cH/rLffu3d3N3Hv3zr3xPvRQrBYz\n9w02iPv77Rf333mn6bTz52fGQ0zzjW+0fM733nPv2dP9lFOaDj/ggFhOOnRuuCH+/upXxXvt0tLQ\noe5f+Yr7l77k/s1vdu6y9tsv8x5vsIH7t74Vw3/zm8yw0aNbf47GRvevfa3p9mgWIXfGGe2r59RT\nc2/LuTz9dHluj42N7ttsE7WlO1o0NLhvuWVmB26PPdy33768PqCae/HFzDaw+ebxgVWAjgR9eXWv\n7CzV1fGVcOLEOCg7bhxMmRJfnbLbMSHTVn7KKTHNkCHRVa2mJuYfMiQz7d13RzvpTTfF+DfeiL77\nzfXqBd/9bvSR32OPuKzyHnvE181Ro+LrfX19fAXt2TOaburq4MEHW39dm20W3efM4LHHorknu8no\nr3+N11hZ2bJbWlsefBD23LNpk8DaWr486qmuji51M2ZEE9oOO8Buu8XBwoceiq+pxfjt3/p6uPPO\nWFba++/HMZQrr4yvzT/7WWwbW2/d9OvxO+/E+9LYGI979Iimk+7d8y/PPdb5gQdGM4d7bE+jR0cb\n7IcfZraPo4+Gs8+OYzy5tplsZjHNGWdEndOmxW81rFnT9rzNnXwyXHNNy205lzvvjNd7zDHtW0Zn\nM4vjFeefn2maraiAE0+EH/84mrcefxx+/vPy/g3p9HGThx+OYxOtbVtrq72fDMW6rdM9+ttvz+w1\nDx3q/thjcX/mzJbTHnOMe//+7osWuW+4oftNN8VewZAhmefIvo0aFfNdeql7ZaX7Bx/krmH27Mw8\nPXq4z50b96+91v3MM+Orvbv7jjtGE86BB+ZeXvPb3LnR3FRRkWmecs9820jfZs8ufH098kjMc/nl\nhc9TiMMPj282777rfv31mdo23jjW2+TJ+d+XjrjggtzrrGfP6M778svxDS89/MUXM/NOnNhyvl//\nuvXlTZ+e+Wbm7v7KK/H4uuvcTzst9kIbGjLTjx0b73uuLoXNrVzpvumm7v/3f/F45Mj4ptoR+bbl\nXLcxYzq2jM72+uux3dx8c2bY4sXxvwXuG20UTWTl7rbbouYFCwqeBTXdtGLp0gjvVavc6+ujbezi\ni1tOt/327iNGxP36+szwVati/ua31atj/Kefun/4Yes1LF/u/sADsdr3398/a7dvaHD/+OOYZsSI\n+BoH7uedl3uZixa5/+tfMc2UKe4PPhj3N9kk2mHdo224Wzf3xx/P3WzUmjFj4vm23rp4X32XLo0P\no/SH29Ch7ttu637nnTFs2rRoD4bM+l8ba9ZE08xBB7Vcd2+/nZnurbfiw7KiItpM0w48MNqm0/NU\nV7fdVp3+cN5rr3h8223xeM6cqKd5W3quYa2pr88sf/XqzDbTXvm25Vy3NWs6tox14aOPWr4fK1e2\nfI/LXb6dwzwU9O0xeHDLQKmvj7bPXB8AxVRdHau+oiITzGlnnOGfHSNYvDj/czQ2RrCPH+9+1VWZ\nPbAbb4y2vspK9yOPjGmPPz72frI/uPJ59934JjNgQDzf3//e0VfZVPpAdf/+0ZYK7ldeGa9j220z\nwwYMiPXy+utrt7y//MXbc4DLR4xw79Mngq2x0f2LX4w97rRrrsmEdi6LFmXqB/cXXnA///z4sG1P\nmIu0oSNBXz4nTK1r1dXw5JPRFp6+PfRQxGWxT9pqLn0SyeDBLX/8PP1jK4ccEtcnyccs6nzqqbj1\n7x/tt1OmRB/mN9/MLGfs2OhuNnVq09dbV9fyRLI//SnazW+6CTbZJLrW1dVl2rndo529ri7a1NPD\nlixp+dzp22uvRbvy3ntHF7nXXotjG2PGZNqfX3st2sFvvTWe95prYjnusYyGhvzPn+t23XVx8s5h\nhxX+nixbFpfHrquDlSubbgejR8d7NWlS7uVNmhSvJX0OxqRJ8MQTcfxhgw0Kq0Gks7T3k6FYt5Lv\n0V99dWYvuPmtrq5zl/3OO7HXfOKJLcfdfXfUcNttbT/PWWfFEftttok22/ReM7j365dpD07vNed6\nrcOGNX3O7O50p56ame6oo2L8ZZdlhqXbby+8MP+6zL7dcEN8te7RI9P7xD3ay9dfP9P7JPuU9x//\nOIYdfXRhy8i+nXNOwW+Jr1kTe/SHHx5dLiGavbKlm7Ty3Q45JKY74ojMsJNOKrwGkQLQgT168/Qe\n0zpWU1PjtbW1JVk2ED0ybr89ei5k69dvrc5AK9jcufClL0GfPk2Hf/op/O1v8StbbfUYuPnmOGMO\n4kzJc8+F226L17Tbbk0v9/zCCy3P2P3HP2LPfeHC+DbwzDNxlt6vfhU9PN5+G2bNipNTZs6MMxOH\nDIl11KdPnLiyeHHMs8020eshn5494+qcFRUwZ058W9lii8z42bOjhspKePXV6PUzdWrs6aenHzGi\n8N5DFRVxYbtNNilseoieKFddFa9j2rT4FpR9EtOKFdHTKt0Tp7mDD44rRr7+epydCnGSVN++hdcg\n0gYze9Lda9o1UyGfBsAwYAGwEJiQY/x2wOPAJ8A5hTxnyffokyC7Z01HTrKqq4uD0hdcEI9PPz16\nobz1VtPpnn8+lnHAAfH31lvdn3mm6bDOOMkr3VsqfZCz+YloxZbu17zeeu477NC5yxLpIDrjYCxQ\nAbwMbAV0B54BBjebZnNgN+AnCvp1qKEhetSA+5IlHXuO4cOjd0p9fZxlme4u2tyee8ZyevfOnNix\n224xrG/fzumdsXp1pgdSrhPROkO62ajQqy6KrGMdCfpCDsbuDix091fcfTUwHWhyJSh3X+Huc4A1\nuZ5AOklFRfwoemVlNAN1xMknR1PDoEFxklC+E3DSw8eMyZzYkR52wglxcLXYsn9Ipr0nBnVUejm7\n7LJulieyDhTy39kPqMt6vARo45S63MxsHDAOYMvWepRI4S6+OHrYdPQMwBEjoj1+xYroudP8Cptp\no0bFFUB/9KPMsGOPjasLnnlmx5ZdiLPOisapdfVjLN/9brxO/fiLJEibB2PN7DvAMHcfm3p8HDDE\n3U/LMe0lQL27/7ytBZf8YKyISBfUkYOxhTTdLAUGZD3unxomIiJdQCFBPwcYZGZVZtYdGAXM6tyy\nRESkWNpso3f3BjM7Dbif6IEzzd3nmdn41PjJZtYHqAV6AY1m9kOiZ04bP0sjIiKdraCuEu5+D3BP\ns2GTs+4vI5p0RESkzHx+r3UjIvI5oaAXEUk4Bb2ISMIp6EVEEk5BLyKScAp6EZGEU9CLiCScgl5E\nJOEU9CIiCaegFxFJOAW9iEjCKehFRBJOQS8iknAKehGRhFPQi4gknIJeRCThShr0NTXQvz9suy0s\nWlTKSkREkqukQb/ffjB0KLz4IsyeXcpKRESSq6CfEuwsV10FH30Et9yiPXoRkc5S8jb6nj1h883h\n1VdLXYmISDKVPOgBqqoU9CIinaUsgn7gQDXdiIh0lrIJ+sWLobGx1JWIiCRPWQR9VRWsWQNvvFHq\nSkREkqcsgn7gwPir5hsRkeIri6Cvqoq/OiArIlJ8ZRH0W24ZfxX0IiLFVxZBv8EG0Levmm5ERDpD\nWQQ9qC+9iEhnKZugHzgQFi4E91JXIiKSLGUT9PvuC6+9Bn/9a6krERFJlrIJ+uOPj4OyF12kvXoR\nkWIqm6Dv3h0uuACeeALuu6/U1YiIJEdBQW9mw8xsgZktNLMJOcabmf2/1Phnzay6I8WccEK01Wuv\nXkSkeNoMejOrACYBhwKDgdFmNrjZZIcCg1K3ccC1HSmme3f43/+F2lq4++6OPIOIiDRXyA+P7A4s\ndPdXAMxsOjASmJ81zUjgD+7uwGwz29TM+rp7u69eM2YM/PSnsVdfWdneuUVEkufLX4Y+fTo+fyFB\n3w+oy3q8BBhSwDT9gHYH/frrw4UXwoknwte/3t65RUSS5+qr4cwzOz7/Ov0pQTMbRzTtsGX6ugc5\nHH98tNWvWrWOChMRKWPbb7928xcS9EuBAVmP+6eGtXca3H0KMAWgpqYm7+FWs+hXLyIia6+QXjdz\ngEFmVmVm3YFRwKxm08wCxqR633wdeK8j7fMiIlJ8be7Ru3uDmZ0G3A9UANPcfZ6ZjU+NnwzcAwwH\nFgIfASd2XskiItIe5iXqsG5mHwALSrLw9ukNvFXqItrQFWqErlGnaiyerlBnV6gRmtb5ZXdvV5/E\ndXowtpkF7l5TwuUXxMxqy73OrlAjdI06VWPxdIU6u0KNsPZ1ls0lEEREpHMo6EVEEq6UQT+lhMtu\nj65QZ1eoEbpGnaqxeLpCnV2hRljLOkt2MFZERNYNNd2IiCScgl5EJOFKEvRtXd++FMxsgJn93czm\nm9k8MzszNfwSM1tqZk+nbsPLoNZXzew/qXpqU8P+y8z+amYvpf5uVsL6ts1aX0+b2ftm9sNyWJdm\nNs3MVpjZc1nD8q47M5uY2k4XmNkhJazxKjN7IfV7D3eY2aap4QPNbFXWOp1cwhrzvr+lWI+t1Pnn\nrBpfNbOnU8NLtS7zZU/xtkt3X6c34uzal4GtgO7AM8DgdV1Hjrr6AtWp+xsDLxLX378EOKfU9TWr\n9VWgd7NhVwITUvcnAFeUus6s93sZ8OVyWJfA3kA18Fxb6y71/j8D9ACqUtttRYlqPBjolrp/RVaN\nA7OnK/F6zPn+lmo95quz2fhfABeVeF3my56ibZel2KP/7Pr27r4aSF/fvqTc/Q13n5u6/wHwPHGp\n5a5iJHBD6v4NwBElrCXbAcDL7r641IUAuPsjwNvNBudbdyOB6e7+ibsvIi7xsXspanT3B9y9IfVw\nNnHhwJLJsx7zKcl6hNbrNDMDvgf8aV3Ukk8r2VO07bIUQZ/v2vVlw8wGArsAT6QGnZ76yjytlE0i\nWRz4m5k9mbr0M8AWnrmQ3DJgi9KU1sIomv4jldu6hPzrrly31ZOAe7MeV6WaGh42s71KVVRKrve3\nXNfjXsByd38pa1hJ12Wz7CnadqmDsc2Y2ReAGcAP3f194mcRtwJ2Jn5I5RclLC9tT3ffmfgJxx+Y\n2d7ZIz2+35W836zF1U4PB25NDSrHddlEuay7fMzsAqABuDk16A1gy9T2cBbwRzPrVaLyyv79bWY0\nTXdCSrouc2TPZ9Z2uyxF0Bd07fpSMLP1iRV9s7vfDuDuy939U3dvBH7LOvrK2Rp3X5r6uwK4g6hp\nuZn1BUj9XVG6Cj9zKDDX3ZdDea7LlHzrrqy2VTM7ATgMOCb1j0/q6/vK1P0nifbar5Sivlbe37Ja\njwBm1g34FvDn9LBSrstc2UMRt8tSBH0h17df51LtdVOB5939l1nD+2ZNdiTwXPN51yUz28jMNk7f\nJw7SPUesw+NTkx0PzCxNhU002WMqt3WZJd+6mwWMMrMeZlYFDAL+XYL6MLNhwHnA4e7+UdbwSjOr\nSN3fKlXjKyWqMd/7WzbrMcuBwAvuviQ9oFTrMl/2UMztcl0fYU7tiAwnjiy/DFxQihpy1LQn8dXo\nWeDp1G04cCPwn9TwWUDfEte5FXHE/RlgXnr9AV8EHgReAv4G/FeJ69wIWAlskjWs5OuS+OB5A1hD\ntG2e3Nq6Ay5IbacLgENLWONCol02vW1OTk377dR28DQwFxhRwhrzvr+lWI/56kwN/z0wvtm0pVqX\n+bKnaNulLoEgIpJwOhgrIpJwCnoRkYRT0IuIJJyCXkQk4RT0IiIJp6AXEUk4Bb2ISML9f+oAwVzi\nEJcuAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Weak learner: decision stumps\n", "\n", "T=200\n", "bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=T,\n", " algorithm='SAMME.R')\n", "\n", "# fit the model (as usual)\n", "bdt.fit(Xtr, Ytr[:,0])\n", "\n", "# The result of AdaBoost with decision stumps can be analyzed\n", "# to find the most important variables from the data set:\n", "\n", "np.where(bdt.feature_importances_ > 0.01)\n", "\n", "# gives the indexes of variables with importance score \n", "# higher than a threshold (0.01)\n", "\n", "# Get the errors, per step:\n", "# - train error\n", "err_tr = np.zeros((T,)) \n", "for i, yp in enumerate(bdt.staged_predict(Xtr)):\n", " err_tr[i] = zero_one_loss(yp, Ytr[:,0])\n", "\n", "# - test error\n", "err_ts = np.zeros((T,)) \n", "for i, yp in enumerate(bdt.staged_predict(Xts)):\n", " err_ts[i] = zero_one_loss(yp, Yts[:,0])\n", "\n", "fig = plt.figure()\n", "ax = plt.subplot(111)\n", "ax.set_ylim(-0.01, 0.5)\n", "ax.set_xlim(0, T+2)\n", "ax.plot(np.arange(T)+1, err_tr, color='blue')\n", "ax.plot(np.arange(T)+1, err_ts, color='red')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RadomForest" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = RandomForestClassifier(n_estimators=10)\n", "clf.fit(Xtr, Ytr[:,0])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zero_one_loss(clf.predict(Xtr), Ytr[:,0]) # train error" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.20999999999999996" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zero_one_loss(clf.predict(Xts), Yts[:,0]) # test error" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=2, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Other parameters\n", "clf = RandomForestClassifier(n_estimators=10, max_depth=2)\n", "clf.fit(Xtr, Ytr[:,0])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train error 0.02308\tTest error: 0.16000\n" ] } ], "source": [ "print \"Train error %1.5f\\tTest error: %1.5f\" % (zero_one_loss(clf.predict(Xtr), Ytr[:,0]), zero_one_loss(clf.predict(Xts), Yts[:,0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TO DO:\n", "* What can you say about error rate on the test set in the 2nd example (with respect to 1st example)?\n", "* Try other parameter combinations..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }