scikit-learn Bokeh scikit-learn & Bokeh David Dobrovolný □ i5P David Dobrovolný scikit-learn & Bokeh scikit-learn Bokeh SCI learn • https://scikit-learn.org • built on NumPy, SciPy and matplotlib • open source • on-going development (6 updates in 2020, as of May 4th 2021, 2 updates in 2021) • used by e.g. JPMorgan, Spotify, Telecom ParisTech, Booking.com David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh SCI learn • classification • regression • clustering • dimensionality reduction • model selection • preprocessiong David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh aive bayes gnb = GaussianNBO model = gnb.fit(X_train, y_train) model.predict (X_test) • Gaussian • Multinomial • Bernoulli □ i5P David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh 19 ecision 1 1 rees elf = tree.DecisionTreeClassifier() elf = elf.fit(X, Y) elf.predict(test) elf.predict_proba(test) • ID3, C4.5, C5.0 and CART • classification and regression • Maximum depth, minimum leaf samples, impurity decrease, ... • Can export to Graphviz format (next slide) David Dobrovolny scikit-learn & Bokeh ecision I rees -learn Bokeh petal length (cm)<2.45 gini = 0.6667 samples = 150 value = [50, 50, 50] ^ class - setosa True/ \^ gini = 0.0 samples = 50 value = [50, 0, 0] class - setosa j petal width (cm)< 1.75^ gini = 0.5 samples = 100 value - [0, 50, 50] class = versicolor v. petal width (cm)< 1.65 gini = 0.0408 samples = 48 value = [0,47, 1] class = versicolor gini = 0.0 samples = 47 value = [0, 47, 0] class = versicolor gini = 0.0 samples = 1 value = [0, 0, 1] class = virginica 'petal length (cm)<4.95v gini = 0.168 samples = 54 value = [0, 49, 5] class = versicolor gini = 0.0 samples = 3 value = [0, 0, 3] class = virginica petal width (cm)< 1.55 gini = 0.4444 samples = 6 value = [0, 2, 4] class = virginica petal length (cm)<4.85^ gini = 0.0425 samples = 46 value = [0, 1, 45] class = virginica sepal length (cm)< 5.95 gini = 0.4444 samples = 3 value =[0,1, 2] class = virginica sepal length (cm)< 6.95 gini = 0.4444 samples = 3 value = [0, 2,1] class = versicolor gini = 0.0 samples = 1 value = [0,1, 0] class = versicolor gini = 0.0 samples = 2 value = [0, 2, 0] class = versicolor gini = 0.0 samples = 1 value = [0, 0,1] class = virginica gini = 0.0 samples = 43 value = [0, 0, 43] class = virginica gini = 0.0 samples = 2 value = [0, 0, 2] class = virginica David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh ector acmnes elf = svm.SVCO clf.fit(X, y) elf.predict(test) • classification and regression • Different kernel functions • Linear: < • Polynomial: (j(x^xf) + r)d 9 RBF: exp(—7 || x — x' ||2) • Sigmoid: tanh(7(x, x') + r) • custom David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh eignoours • unsupervised • Ball Tree • KDTree • brute force • auto (algorithm determines the best approach) • classification • uniform weights • weights based on distance (next slide) • regression David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh eighbours 3-Class classification (k = 15, weights = 'uniform') 3-Class classification (k = 15, weights = 'distance') 5 0 4.5 4.0 Ü 3.5 - 3.0 2.5 - 2 0 1.5 1 0 O O O O DO QO OOO O o o ooo o oo o oo o o oo • • oo ooo o oo ••• o oo oooto o •o« om»*o o o o • o •• o oo • • o oo* • • o o ooo o • o 5 o 4.5 4.0 Ü 3.5 1 3.0 2.5 2 0 1.5 1.0 setosa versicolor inica o •• o •* o o o o o oo o oo ooo o o o ooo o oo o 00 o o oo ■ • OO OOO O 00 ••• •<:<•• o oo oooto o •o« o»»«o o o o • o o oo • • o oo* • • o o ooo o • o sepal length (cm) S 7 sepal length (cm) David Dobrovolný scikit-learn & Bokeh scikit-learn Bokeh Clustering (next slide) Ensemble methods • Random Forests • Ada boost Semi-supervised learning • Self Training Neural Networks David Dobrovolny scikit-learn & Bokeh ustermg scikit-learn Bokeh MiniBatch Affinity Spectral Agglomerative Gaussian KMeans__Propagation MeanShift Clustering__Ward__Clustering DBSCAN__OPTICS__BIRCH__Mixture 3.80s .05s .015 4.45s .06s ,57s .06s .06s .01s 1.01s .02s .00s w W .Ols 2.43s "w .15s ,12s w .54s ,44s 'f* .ois •jut:.;. w ,94s .02s w .Ols \ 1.97s \ .lis \ .20s V \ .21s \ .Ols \ .9B& \ • .Ols m • 1.90s m • .07s m ♦ .18s ♦ .08s m ♦ .06s • .Ols m • .95s m • .02s * .Ols .01s 1.77s .12s .14s .07s .06s .Ols .94s .02s .Ols David Dobrovolny scikit-learn & Bokeh scikit-learn Bokeh • https://bokeh.org • Interactive web browser visualizations. • Server App - allows more interactive manipulation • Notebook 9 Standalone - limited interactivity, produces html file • Examples: https://docs.bokeh.org/en/ latest/docs/gallery.html David Dobrovolny scikit-learn & Bokeh