# Implement class ‹Merkle› which provides the following methods: # # • ‹__init__( conn )› sets up the object, using ‹conn› as the # database connection (you can assume that this is an ‹sqlite3› # connection), # • ‹store( path )› stores the tree corresponding to the directory # ‹path› from the filesystem into the database (see below about # format) and returns its hash, # • ‹diff_path( hash_old, path_new )› computes a recursive diff # between the directory given by the ‹hash_old› stored in the # database and the directory given by ‹path_new› (in the # filesystem), # • ‹diff( hash_old, hash_new )› computes a recursive diff between # two directories stored in the database, # • ‹fetch( hash, path )› creates directory ‹path› in the # filesystem (it is an error if it already exists, or if anything # else is in the way) and makes a copy of the tree with root # directory given by ‹hash› (from the database into the # filesystem), returning ‹True› on success and ‹False› on error, # • ‹find( root_hash, node_path )› returns the hash of a node that # is reached by following ‹node_path› starting from the directory # given by ‹root_hash›, or ‹None› if there is no such node. # # The format of the trees is as follows: # # • a regular file corresponds to a leaf node, and its hash is # simply the hash of its content, # • a directory node is a list of (item hash, item name) tuples; to # compute its hash, sort the tuples lexicographically by name, # separate the item hash from the name by a single space, and put # each tuple on a separate line (each line ended by a newline # character). # # These are the only node types. The same node (two nodes are the # same if they have the same hash) must never be stored in the # database twice. The ‹find› operation must be «fast» even for very # large directories (i.e. do not scan directories sequentially). # Paths are given as strings, components separated by a single ‹/› # (forward slash) character. # # The recursive diff should be returned as a ‹dict› instance with # file paths as its keys, where: # # • a path appears in the dictionary if it appears in either of # the trees being compared (except if it is in both, and the # content of the associated files is the same), # • the values are ‹Diff› objects, with the following methods: # ◦ predicates ‹is_new›, ‹is_removed› and ‹is_changed›, # ◦ ‹old_content›, ‹new_content› which return a ‹bytes› object # with the content of the respective file (if applicable), # ◦ ‹unified› which returns a ‹str› instance with a # ‹difflib›-formatted unified diff (it is the responsibility of # the caller to make sure the files are utf8-encoded text # files). # # For instance, doing ‹diff( foo, foo )› should return an empty # ‹dict›. You are encouraged to fetch the file content lazily. # Diffing trees with a few hundred files each, where most files are # 100MiB, should be very fast and use very little memory if we only # actually read the content diff for a single small file. # # The hashes are SHA-2 256 and in the API, they are always passed # around as a ‹bytes› object (which contains the raw hash, 32 bytes # long). When hashing directories, the hashes are written out in hex # (base 16).