NumPy + Matplotlib - řešení January 6, 2021 1 NumPy • de facto standard pro numerické výpočty v Pythonu • velké množství dalších modulů postavených nad NumPy (SciPy, scikit-learn, pandas, …) [1]: import numpy as np 1.1 NumPy pole • obdoba typu list z Pythonu • základní objekt, se kterým NumPy pracuje • pouze prvky stejného typu • fixní velikost [5]: a = np.array([1, 2, 3]) a [5]: array([1, 2, 3]) [6]: a.dtype [6]: dtype('int64') [7]: np.array([1, 'ahoj', False]) [7]: array(['1', 'ahoj', 'False'], dtype='] 6 [84]: plt.grid() plt.plot(xs, np.sin(xs), '-o', color='red') [84]: [] 7 [90]: plt.grid() plt.xlim(-1, 11) plt.ylim(-2, 2) plt.title('Goniometrické funkce') plt.plot(xs, np.sin(xs), label='$y = \sin{x}$') plt.plot(xs, np.cos(xs), label='$y = \cos{x}$') plt.legend() plt.savefig('image.png') 1.7.1 Bar plot [93]: x = np.random.randint(10, size=10) x [93]: array([5, 1, 9, 4, 9, 5, 5, 8, 2, 1]) [96]: plt.title('Sloupcový graf') plt.bar(np.arange(10) + 1, x, color='red') [96]: 8 1.7.2 Histogram [104]: plt.hist(np.random.sample(100), bins=20) [104]: (array([10., 1., 4., 5., 4., 5., 4., 7., 7., 7., 4., 7., 8., 6., 4., 3., 2., 4., 5., 3.]), array([0.01256698, 0.06128679, 0.1100066 , 0.15872641, 0.20744622, 0.25616603, 0.30488584, 0.35360565, 0.40232546, 0.45104528, 0.49976509, 0.5484849 , 0.59720471, 0.64592452, 0.69464433, 0.74336414, 0.79208395, 0.84080376, 0.88952357, 0.93824339, 0.9869632 ]), ) 9 1.7.3 Scatter plot [113]: xs = np.random.sample(50) ys = np.random.sample(50) sizes = np.random.randint(200, size=50) colors = np.random.randint(3, size=50) [109]: colors [109]: array([1, 2, 0, 2, 0, 0, 0, 2, 1, 1, 0, 2, 0, 1, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 2, 0, 0, 2, 1, 1, 2, 2, 0, 2, 0, 1, 0, 0, 1, 0, 2, 1, 0, 2, 0, 0, 2, 2, 2, 1]) [116]: plt.scatter(xs, ys, c=colors, s=sizes) [116]: 10 1.8 Otázka: Jsou tyto příkady ekvivalentní? xs, ys = np.random.sample(10), np.random.sample(10) 1) for x, y in zip(xs, ys): plt.scatter(x, y) 2) plt.scatter(xs, ys) [119]: xs, ys = np.random.sample(10), np.random.sample(10) [123]: for x, y in zip(xs, ys): plt.scatter(x, y, color='red') 11 [124]: plt.scatter(xs, ys, color='red') [124]: 12 1.9 NumPy - masky [125]: a = np.random.randint(100, size=16).reshape(4, 4) a [125]: array([[41, 75, 9, 46], [46, 45, 62, 37], [13, 70, 51, 15], [95, 85, 97, 82]]) [127]: mask = a > 20 mask [127]: array([[ True, True, False, True], [ True, True, True, True], [False, True, True, False], [ True, True, True, True]]) [129]: a[mask] [129]: array([41, 75, 46, 46, 45, 62, 37, 70, 51, 95, 85, 97, 82]) [130]: a[a > 20] [130]: array([41, 75, 46, 46, 45, 62, 37, 70, 51, 95, 85, 97, 82]) [131]: a[~mask] [131]: array([ 9, 13, 15]) [132]: xs = np.arange(100) ys = np.random.sample(100) plt.scatter(xs, ys) [132]: 13 [135]: mask = xs < 50 plt.scatter(xs[mask], ys[mask]) plt.scatter(xs[~mask], ys[~mask]) [135]: 14 [136]: for threshold in np.linspace(0, 1, 6): mask = (ys > threshold) & (ys < threshold + 0.2) plt.scatter(xs[mask], ys[mask]) 1.9.1 Příklad - náhodnostní výpočet π [154]: xs = np.random.sample(100000) * 2 - 1 ys = np.random.sample(100000) * 2 - 1 [144]: plt.figure(figsize=(5, 5)) plt.scatter(xs, ys) [144]: 15 Jednotkový kruh je množina bodů, pro které platí x2 + y2 ≤ 1. [155]: mask = xs ** 2 + ys ** 2 <= 1 [156]: plt.figure(figsize=(5, 5)) plt.scatter(xs[mask], ys[mask], s=1) plt.scatter(xs[~mask], ys[~mask], s=1) [156]: 16 počet bodů počet bodů v kruhu ˜= (2r)2 πr2 = 4r2 πr2 = 4 π π ˜= 4 · počet bodů v kruhu počet bodů [157]: 4 * np.sum(mask) / len(mask) [157]: 3.14304 1.10 NumPy vstup a výstup • textový – np.savetxt a np.loadtxt – pracuje se standardním CSV – potřeba nastavit způsob uložení a načtení • binární – np.save a np.load – rychlejší, menší velikost (?) 17 [158]: a = np.random.randint(100, size=50).reshape(5, 10) a [158]: array([[91, 83, 1, 10, 20, 73, 48, 84, 57, 63], [ 5, 22, 26, 3, 55, 31, 32, 31, 36, 85], [ 6, 85, 85, 74, 35, 67, 19, 91, 16, 85], [92, 92, 17, 57, 30, 39, 85, 26, 74, 84], [80, 52, 60, 95, 48, 78, 96, 21, 59, 99]]) [163]: np.savetxt('data.csv', a, fmt='%03d') [164]: b = np.loadtxt('data.csv', dtype=int) b [164]: array([[91, 83, 1, 10, 20, 73, 48, 84, 57, 63], [ 5, 22, 26, 3, 55, 31, 32, 31, 36, 85], [ 6, 85, 85, 74, 35, 67, 19, 91, 16, 85], [92, 92, 17, 57, 30, 39, 85, 26, 74, 84], [80, 52, 60, 95, 48, 78, 96, 21, 59, 99]]) [165]: np.save('data.npy', a) [167]: b = np.load('data.npy') b [167]: array([[91, 83, 1, 10, 20, 73, 48, 84, 57, 63], [ 5, 22, 26, 3, 55, 31, 32, 31, 36, 85], [ 6, 85, 85, 74, 35, 67, 19, 91, 16, 85], [92, 92, 17, 57, 30, 39, 85, 26, 74, 84], [80, 52, 60, 95, 48, 78, 96, 21, 59, 99]]) 1.11 NumPy - rychlost [168]: %%timeit [x ** 2 for x in range(1000)] 235 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [169]: %%timeit np.arange(1000) ** 2 2.31 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18 1.11.1 Součet dvou seznamů/polí [170]: a_np = np.random.randint(100, size=10 ** 6) b_np = np.random.randint(100, size=10 ** 6) a_py = list(a_np) b_py = list(b_np) Python - tři způsoby: [171]: %%timeit c_py = [] for i in range(len(a_py)): c_py.append(a_py[i] + b_py[i]) 191 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) [172]: %%timeit c_py = [] for x, y in zip(a_py, b_py): c_py.append(x + y) 143 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) [174]: %%timeit c_py = [x + y for x, y in zip(a_py, b_py)] 117 ms ± 763 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) NumPy [175]: %%timeit c_np = a_np + b_np 920 µs ± 5.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 19