PV248 Python Petr Ročkai PV248 Python 1/76 November 27, 2017 Disclaimer • I am not a Python programmer • please don’t ask sneaky language-lawyer questions Goals • let’s learn to use Python in practical situations • have a look at existing packages and what they can do for us • code up some cool stuff & have fun Organisation • I’m in India next Monday, Mr. Kaplan will come instead • starting 9th of Oct, we can start at 8:30 (let’s have a vote) PV248 Python 2/76 November 27, 2017 Stuff We Could Try • working with text, regular expressions • using the pdb debugger • plotting stuff with bokeh (https://bokeh.pydata.org) • talking to SQL databases • talking to HTTP servers • being an HTTP server • implementing a JSON-based REST API • parsing YAML and/or JSON data • … (suggestions welcome) PV248 Python 3/76 November 27, 2017 Some Resources • https://docs.python.org/3/ (obviously) • https://github.com/VerosK/python-pv248 • https://msivak.fedorapeople.org/python/ • study materials in IS • … PV248 Python 4/76 November 27, 2017 Part 1: Text & Regular Expressions PV248 Python 5/76 November 27, 2017 Reading Input • opening iles: open('scorelib.txt', 'r') • iles can be iterated f = open( 'scorelib.txt', 'r' ) for line in f: print line PV248 Python 6/76 November 27, 2017 Regular Expressions • compiling: r = re.compile( r"Composer: (.*)" ) • matching: m = r.match( "Composer: Bach, J. S." ) • extracting captures: print m.group(1) − prints Bach, J. S. • substitutions: s2 = re.sub( r"\s*$", '', s1 ) − strips all trailing whitespace in s1 Other String Operations • better whitespace stripping: s2 = s1.strip() • splitting: str.split(';') PV248 Python 7/76 November 27, 2017 Dictionaries • associative arrays: map (e.g.) strings to numbers • nice syntax: dict = { 'foo': 1, 'bar': 3 } • nice & easy to work with • can be iterated: for k, v in dict.items() Counters • from collections import Counter • like a dictionary, but the default value is 0 • ctr = Counter() • compare ctr['baz'] += 1 with dict PV248 Python 8/76 November 27, 2017 Exercise 1: Input • get yourself a git/mercurial/darcs repository • grab input data (scorelib.txt) from study materials • read and process the text ile • use regular expressions to extract data • use dictionaries to collect stats • beware! hand-written, somewhat irregular data PV248 Python 9/76 November 27, 2017 Exercise 1: Output • print some interesting statistics − how many pieces by each composer? − how many pieces composed in a given century? − how many in the key of c minor? • bonus if you are bored: searching − list all pieces in a given key − list pieces featuring a given instrument (say, bassoon) PV248 Python 10/76 November 27, 2017 Exercise 1: Example Output • Telemann, G. P.: 68 • Bach, J. S.: 79 • Bach, J. C.: 6 • … For centuries: • 16th century: 10 • 17th century: 33 • 18th century: 4 PV248 Python 11/76 November 27, 2017 Cheat Sheet for line in open('file', 'r') read lines dict = {} an empty dictionary dict[key] = value set a value in a dictionary r = re.compile(r"(.*):") compile a regexp m = r.match("foo: bar") match a string if m is None: continue match failed, loop again print m.group(1) extract a capture for k, v in dict.items() iterate a dictionary print "%d, %d" % (12, 1337) print some numbers PV248 Python 12/76 November 27, 2017 Part 2: Databases & SQL PV248 Python 13/76 November 27, 2017 SQLite • lightweight in-process SQL engine • the entire database is in a single ile • convenient python module, sqlite3 • stepping stone for a “real” database Other Databases • postgresql (psycopg2, …) • mysql / mariadb (mysql-python, mysql-connector, …) • big & expensive: Oracle (cx_oracle), DB2 (pyDB2) PV248 Python 14/76 November 27, 2017 More Resources & Stuff to Look Up • SQL: https://www.w3schools.com/sql/ • https://docs.python.org/3/library/sqlite3.html • Python Database API: PEP 249 • Object-Relational Mapping • SQLAlchemy: constructing portable SQL • SQL Injection PV248 Python 15/76 November 27, 2017 Database Structure • de ined in scorelib.sql (see study materials) • import with: sqlite3 scorelib.dat < scorelib.sql • you can rm scorelib.dat any time to start over • consult comments in scorelib.sql • do not store duplicate rows PV248 Python 16/76 November 27, 2017 Python Objects • class Foo, with inheritance: class Bar( Foo ) • initialisation: __init__( self, ... ) • calling super-class methods: super().method( param ) • you can use super() to call parent’s __init__ • object variables are created in __init__, not in class PV248 Python 17/76 November 27, 2017 Python Objects (cont’d) • don’t forget self! • self.variable = 3 sets the object variable • different from variable = 3 • set up your variables in __init__ • methods take self as an explicit argument PV248 Python 18/76 November 27, 2017 Exercise 2 • create an empty scorelib.dat from scorelib.sql • fetch scorelib-import.py as a starting point • part 1: import composers & editors into the database − use the pre-made class Person for this − inish the implementation of its __init__ − use regular expressions (cf. Exercise 1) − string.split() may come in handy PV248 Python 19/76 November 27, 2017 Exercise 2 (cont’d) • part 2: import scores − implement a Score class similar to Person − authors should be stored as a list of Person objects − also ill in the score_author table − think about how would you de-duplicate score rows • part 3: the rest of the import − details in scorelib.sql − inish at home (you might need this later) PV248 Python 20/76 November 27, 2017 SQL Cheat Sheet • INSERT INTO table (c1, c2) VALUES (v1, v2) • SELECT (c1, c2) FROM table WHERE c1 = "foo" sqlite3 Cheats • conn = sqlite3.connect( "scorelib.dat" ) • cur = conn.cursor() • cur.execute( "... values (?, ?)", (foo, bar) ) • conn.commit() (don’t forget to do this) PV248 Python 21/76 November 27, 2017 Part 3: SQL Redux & JSON PV248 Python 22/76 November 27, 2017 JSON • structured data format • atoms: integers, strings, booleans • objects (dictionaries), arrays (lists) • widely used around the web &c. • simple (compared to XML or YAML) PV248 Python 23/76 November 27, 2017 JSON: Example { "composer": [ "Bach, Johann Sebastian" ], "key": "g", "voices": { "1": "oboe", "2": "bassoon" } } PV248 Python 24/76 November 27, 2017 JSON: Writing • printing JSON seems straightforward enough • but: double quotes in strings • strings must be properly \-escaped during output • also pesky commas • keeping track of indentation for human readability • better use an existing library: import json PV248 Python 25/76 November 27, 2017 JSON in Python • json.dumps = short for dump to string • python dict/list/str/… data comes in • a string with valid JSON comes out Work low • just convert everything to dict’s and lists • run json.dumps or json.dump( data, file ) PV248 Python 26/76 November 27, 2017 Python Example d = {} d["composer"] = ["Bach, Johann Sebastian"] d["key"] = "g" d["voices"] = { 1: "oboe", 2: "bassoon" } json.dump( d, sys.stdout, indent=4 ) Beware: keys are always strings in JSON PV248 Python 27/76 November 27, 2017 Exercise 3: Preliminaries • pull data from scorelib.dat using SQL • print the results as (nicely formatted) JSON • get input from sys.argv (you need to import sys) − note that sys.argv[0] is the program name • run as, for instance: python search.py Bach PV248 Python 28/76 November 27, 2017 Exercise 3: Part 1 • write a script getprint.py • the input is a print number • the output is a list of composers • you will need to use SQL joins • select ... from person join score_authors on person.id = score_author.composer ... where print.id = ? • hint: the result of cursor.execute is iterable PV248 Python 29/76 November 27, 2017 Exercise 3: Part 2 • write a script search.py • the input is a composer name (substring) • the output is a list of all matching composers • along with all their scores in the database • optionally also with print numbers • ... where person.name like "%Bach%" PV248 Python 30/76 November 27, 2017 Part 4: Plotting with Bokeh PV248 Python 31/76 November 27, 2017 Preliminaries: Parsing JSON • import json • json.load is the counterpart to json.dump from last time − de-serialise data from an open ile − builds lists, dictionaries, etc. • json.loads corresponds to json.dumps PV248 Python 32/76 November 27, 2017 Bokeh • a library for plotting data in python • not included in the default python install • (in shell) $ pip3 install --user bokeh • from bokeh.plotting import figure, show PV248 Python 33/76 November 27, 2017 A Simple Bar Plot from bokeh.plotting import figure, show p = figure( x_range = (-1,10) ) p.vbar( x = [0, 1], top = [25, 50], width = 0.7 ) show( p ) PV248 Python 34/76 November 27, 2017 A Simple Pie Chart from bokeh.plotting import figure, show from numpy import pi p = figure( x_range = (-10,10) ) p.wedge( x = 0, y = 0, radius = 5, start_angle = [ 1/4 * pi, 6/4 * pi ], end_angle = [ 6/4 * pi, 1/4 * pi ], color = ["purple", "darkblue"] ) show( p ) PV248 Python 35/76 November 27, 2017 Creating Data Sources from bokeh.models import ColumnDataSource src = ColumnDataSource( data = { 'start': [ 1/4 * pi, 6/4 * pi ], 'end': [ 6/4 * pi, 1/4 * pi ], 'color': [ "purple", "darkblue" ], 'label': [ "mlem", "purr" ] } ) Notice the label column – this will become the legend. PV248 Python 36/76 November 27, 2017 Using Data Sources from bokeh.plotting import figure p = figure() p.wedge( x = 0, y = 0, radius = 5, start_angle = 'start', end_angle = 'end', color = 'color', legend = 'label', source = src ) PV248 Python 37/76 November 27, 2017 Exercise 4 • grab election.json from study materials • part 1: load the data and create a bar plot − bigger parties have a 'color' key in the JSON − they also have a 'short' key for the acronym − set up fallbacks for both (either may be missing) PV248 Python 38/76 November 27, 2017 Exercise 4 (cont’d) • part 2: summarise the below-one-percent parties − only create a single bar for those − add a legend using the short names • part 3: make a pie chart with the results • optional: count the share of those who abstained − include them as a separate slice in the pie chart PV248 Python 39/76 November 27, 2017 Part 5: Serving HTTP PV248 Python 40/76 November 27, 2017 Hyper-Text Transfer Protocol • originally a simple text-based, stateless protocol • however − SSL/TLS, cryptography (https) − pipelining (somewhat stateful) − cookies (somewhat stateful in a different way) • typically between client (browser) and a front-end server • but also as a back-end protocol (web server to app server) PV248 Python 41/76 November 27, 2017 Request Anatomy • request type (see below) • header (text-based, like e-mail) • content Request Types • GET – asks the server to send a resource • HEAD – like GET but only send back headers • POST – send data to the server PV248 Python 42/76 November 27, 2017 Python and HTTP • both client and server functionality − import http.client − import http.server • TLS/SSL wrappers are also available − import ssl • synchronous by default • async available (next time) PV248 Python 43/76 November 27, 2017 Serving Requests • derive from BaseHTTPRequestHandler • implement a do_GET method • this gets called whenever the client does a GET • also available: do_HEAD, do_POST, etc. • pass the class (not an instance) to HTTPServer PV248 Python 44/76 November 27, 2017 Serving Requests (cont’d) • HTTPServer creates a new instance of your Handler • the BaseHTTPRequestHandler machinery runs • it calls your do_GET etc. method • request data is available in instance variables − self.path, self.headers PV248 Python 45/76 November 27, 2017 Talking to the Client • HTTP responses start with a response code − self.send_response( 200, 'OK' ) • the headers follow (set at least Content-Type) − self.send_header( 'Connection', 'close' ) • headers and the content need to be separated − self.end_headers() • inally, send the content by writing to self.wfile PV248 Python 46/76 November 27, 2017 Sending Content • self.wfile is an open ile • it has a write() method which you can use • sockets only accept byte sequences, not str • use the bytes built-in function to convert str to bytes PV248 Python 47/76 November 27, 2017 Exercise 5 • implement a simple HTTP server − listen on a high port (e.g. 8000) − point your browser to http://localhost:8000/ • part 1: serve some static text (or HTML) • part 2: get & print back some data from the URL − e.g. when serving http://localhost:8000/file.txt − return “you asked for file.txt” Reminder: https://docs.python.org/3/library PV248 Python 48/76 November 27, 2017 Part 6: A Simple Web Interface PV248 Python 49/76 November 27, 2017 Parsing URLs • import urllib.parse • urlparse – parses the URL − e.g. url = "/foo?bar=1" − urlparse( url ).query gives bar=1 • parse_qs – parses the query string − loads up the key-value pairs into a dict PV248 Python 50/76 November 27, 2017 Building Strings • str() – turns an object into a string • the % operator: printf-style formatting − "