Templates Complexity STL containers Cache miss Homework 6. Templates and complexity Jan Dugäcek September 12, 2017 □ r5> Jan Dugäcek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework o o o Templates a Why? • Main usage • Other usage • Exercises Complexity • Linked list • Big O notation • Exercises STL containers • std::vector • std::list 9 std::map 9 std::unordered_map • Exercises Cache miss • A bit of physics • Cache • Avoiding cache misses • Data structures and cache misses Homework < □ ► 1= <\(y Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Why? Main usage Other usage Exercises struct poi nterGuard { i n t * poi nter_ ; poi nterG ua rd ( i n t * pointer) poi nter _ ( po i n t e r) { } ~poi nterG ua rd () { delete pointer_; } }; • The code above is not particularly useful, because it has to be defined for each type it might contain • Any pointer can be converted to void*, but the delete needs to know the exact type so that it would call proper destructors (freeQ would work, but it would not call a destructor) Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Why? Main usage Other usage Exercises template struct poi nterGuard { T poi nter_ ; poi nterG ua rd (T pointer) poi n ter _ ( po i n t e r) { } ~ poi nterG ua rd () { delete pointer_; } }; void doStuff { poi nterG ua rd guard(new int(6)); poi nterG ua rd guard2(new d o u b I e (3 .14 1 5 2 6)); } 9 pointerGuard is a template class, each instance is a different structure with different functions as methods, created at compile time • There is no performance impact • If the template argument is not a pointer, the destructor will fail to compile Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Why? Main usage Other usage Exercises • Although the template arguments can be of the typename type, there are more specific types of template arguments, like int that requires it to be an integer (arithmetic can be done on it at compile time) or class that requires it to be a class template struct array { T contents[si ze]; T& operator[] (int at) { if (at>=0 && at arr; • In this example, size is not a variable, but a constant used at compile time Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Why? Main usage Other usage Exercises • Templates produce long and hard to comprehend error messages when used wrongly • Templates can be used to do much more complex things, to the extent that any algorithm can be programmed with templates to be executed at compile time o This leads to all kinds of weirdness, i.e. the following erroneous is a broken loop at compile time: tern plate struct fibonacci { static constexpr int value = f i bo n a cci :: va I u e + f i bo n a cci :: va I u e ; }; // Constants for 1st and 2nd values are not given //••• int result = f i bo n a cc i <40 >:: va I u e ; Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Why? Main usage Other usage Exercises O Write a sharedPointer class that holds a pointer of any type, keeps track of its copies and deletes the pointer when the last copy is destroyed O Write a expandableArray class that can dynamically reallocates an array of objects of given type if it is filled, needs operator [] and an appendO method struct something { something (const somethings a) { /*copying*/} void o pe r a to r =( co n st somethings a) { /* a ssi gn m en t */} Advanced: O Write class deletablePointer that holds a pointer of any type, keeps track of its copies and all its copies' target is set to nullptr when the target is deleted Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Linked list Big O notation Exercises struct }; struct section { section * previous ; section * next ; i n t value; linkedList { section * first; section* last; If we insert an element into the middle of an array, a large part of the array is copied If we save the array in a linked list, where each element is stored elsewhere but contains pointers to next and previous elements, this operation's duration does not depend on its size Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Linked list Big 0 notation Exercises I 1 3 3 1 • Inserting into the middle of an array of size n may require up to c • n operations (where the whole array is copied), where c is some constant; we mark this O(n) and call it linear complexity 9 Accessing an element in an array of size n requires c operations, where c is some other constant; we mark this 0(1) and call it constant complexity • Inserting into the middle of a linked list requires c operations, where c is some constant (different than the previous one); this is again 0(1) o Accessing /7-th element of a linked list requires c • n operations (it needs to run through all previous ones to find it), where c is some constant; this is again O(n) • The constants are clearly different, linked list is clearly slower, but if the amount of data is huge, the one with better complexity is better (so if we need a lot of insertions, we have to use the linked list) Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Linked list Big O notation Exercises d An algorithm that sorts an array of size n by running n times through it and swapping couples of elements in the wrong order (called bubblesort) needs c • n2 operations to sort the array, so its complexity is (D(n2), we call it quadratic complexity o An algorithm that finds a number in a sorted array of n numbers can do it by checking if the middle element is greater and lesser than the number sought, focus on the half where the element is, compares it with the middle of this half, selects the quarter where it is and so on does c • In(r?) operations to find it, so its complexity is 0(\n(n)), we call it logarithmic complexity • An algorithm that learns the prime divisors of number of size n bits by trying to divide it by all numbers up to needs (if unlucky) Ci • 2^C2'n operations to find it, so its complexity is G(en) and we call it exponential complexity Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework Linked list Big O notation Exercises When using the big O notation, we care only about how fast the value increases when n is sufficiently large, regardless of constants or slower increasing parts The time needed by any two algorithms' marked as O(n) to work through data of size n differs by a non-zero, non-infinity factor if n is large enough Therefore O(n) = 0(n + 10000) (the constant does not matter if n is large enough), O(n) = O(10000n) ( lim ^®®®n _ constant, but we would usually prefer the first algorithm), O(n) = 0(n + In n) (if n is large enough, the linear part is too small because lim 77 ^n 77 — If the time depends on more values, for example n and m, both are in the notation, for example G(m + In n) and its writing is not reduced (in this case, n may be way larger than m) In reality, sizes can be small, so time 2n + 5 may be better than time 200, although first is O(n) and the other is 0(1) <[fj]^ 1: -r)<\(y Jan Dugacek 6. Templates and complexity Templates Complexity Linked list STL containers Big 0 notation Cache miss Exercises Homework O Find a way to sort an array with complexity only G(n • log/?) (just find a way, writing the code would take long) O Find a way to store data in a way that accessing a given element would have complexity O(\ogn) and adding an element would also have O(\ogn) complexity (just think of a way) Advanced: O Find a way to sort an array of elements numbered from 0 to n (not necessarily all are present) with complexity O(n) O Prove that the complexity of an operations that appends at the end of an array, reallocating it to a larger one (copying all previous elements) when it's full, is 0(1) in an average case (it's called ammortised complexity, one addition may need c • n operations, but if we average enough of them, it's different) Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework std::vector std::list std::map std::unordered_map Exercises • std: : vector is a class that maintains an expanding array • Its elements are accessed with operator [] and placed at the end of the array with its push_back() method • Do not get pointers to its elements, the array may be reallocated, rendering all pointers invalid std :: vector vec ; for (int i = 1; i < 10; i++) vec . push_back(42 / i); vec [3] = 0; unsigned int size = vec.size(); • You need to include vector to use this • This data structure is called vector in computer science Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework std::vector std::list std::map std::unordered_map Exercises • Data is stored in a linked list, allows adding or removing elements anywhere in constant time, but accessing an element at given index needs linear time • Elements are accessed using an iterator, which holds information about the element • You can access iterator's elemens using the dereference operator std : : I i s t list; for (int i = 1; i < 10; i++) list. push_back(42 / i ); for (std : : I ist :: itera tor it = Iist . begin (); it != list, end () ; ) if (*it = 3) it = I i s t . e r a se ( i t ); else ++i t ; I i s t . i n se rt ( I i s t . begi n () , 3); • You need to include list to use this Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework std::vector std::list std::map std::unordered_map Exercises • Data is stored in a red-black tree, allows accessing elements in logarithmic time, adding or removing elements anywhere also in logarithmic time • Elements and indexes can be read using an iterator (they are sorted by their indexes), but the main usage is accessing via operator [] (can be used to construct new elements as well) • The index does not have to be an integer, it can be anything that can be compared std ::map array; array["hi"] = 3; a r ra y [ 1 za phod1 ] = 4 ; array[ ] = a r r a y [ 1 za phod 1 ] — 1; for (std ::map:: iterator it = array, begin (); it != array. end(); -H-it) std::cout« it —>first « "="« it->second « std::endl; • You need to include map to use this • std: :multimap can contain more elements with the same index, but access is not so simple Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework std::vector std::list std::map std::unordered_map Exercises • Data is stored in a hash table, allows accessing, adding and removing elements in constant time • Elements are accessed and created via operator [], elements and indexes can be read using an iterator (they are not stored in any specific order) • The indexes must be convertible to integers with as little relation to them as possible using a std: :hash function, so it's a bit trickier to get working with custom types • This is called dictionary in many other programming languages std : : unordered_map hashtable ; hashtable [" Breivik"] = 70; hashtable ["Osama"] = 3000 + h a s h t a b I e [ " B r e i v i k " ] ; hashtable [" Stalin "] = h a s h t a b I e [ " Osama " ] * 6666; if ( hashtable . find ("Dugi") = h a s h t a b I e . end ()) std : : cout « "Dugi not a murderer!" « std : : end I ; • You need to include unordered_map to use this 9 std: :unordered_multimap can contain more elements with the same index, but access is not so simple Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework std::vector std::list std::map std::unordered_map Exercises Exercises O Rewrite an existing exercise that uses an expanding array to use std::vector O Write a program that sorts a certain number of randomly generated elements using bubblesort (array is passed n times, always swapping adjacent out-of-order elements) and using std::map to sort them, compare the time it takes #i n c I u d e //■■■ time_t before = time(O); for (int i = 0; i < 10000; i++) {} std : : cout « "Took " « (time(0) - before) « std : : endl Advanced: O Write a filter() function that reads through a std: :list, erasing all elements lesser than a number given as another argument Q Study the order in which are numbers stored in a std: :unordered_map indexed by numbers, is it a good way to order them? Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework A bit of physics Cache Avoiding cache misses Data structures and cache misses • The computer's memory is large enough for many purposes (4 GiB means a milliard of integers), but its distance from the processor needs a bit of thought • The distance between CPU and RAM is roughly 10 cm, so the time needed to travel there is roughly at least 3.10_8s (assuming the speed of light, which is not the case) • If the processor is running at 2 GHz, its cycle takes 2.10_9s (and it does in average 7 operations per cycle!) • In practice, it can be assumed that it takes as much as time as 150 cycles Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework A bit of physics Cache Avoiding cache misses Data structures and cache misses E i 1 • To deal with this issue, the processor has its own small memory called cache that contains all recently accessed data along with their surroundings • Cache has usually several levels, each larger and with greater access time, the smallest is about 64 KiB, the largest 128 MiB • Because this is still imperfect, most processors have a feature called hyper-threading, a CPU-like structure that does other stuff while it is waiting for data to arrive into cache Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework A bit of physics Cache Avoiding cache misses Data structures and cache misses 9 Even with hyper-threading, the processor would mostly wait for data rather than do work if the program had a cache miss every dozen operations • There is no way to tell the CPU to cache some area and in most cases, it would not be known ahead anyway • The program should therefore have the data it needs at similar time stored at similar location • Automatic local variables are stored close to each other and some of them are always accessed (current function's arguments), so they are fast • Accessing a dynamically stored array may cause a cache miss, but once it's accessed, all the elements around the accessed element are cached and can be quickly accessed • Dynamically allocated pieces of data spread on many locations (depending on what was free at allocation time) often incur a cache miss per access • Locations accessed through many consecutive pointer jumps can cause several cache misses per access • The main benefit of using char and short int instead of int is that they take less space and more of them can be cached simultaneously Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework A bit of physics Cache Avoiding cache misses Data structures and cache misses Da ta St ru d ti u res ai id ca d ie i nr ■ II sses • A regular array is the fastest • std: : vector is the best solution for most cases, because its contents are coherent (and thus can be the best even if you need to insert into its middle using an iterator) • If std: : vector is not good, usually the best solution is std: :unordered_map because it can access similarly to vector after calculating its address • std: :list often causes a cache miss per iteration, so it's rather slow and should be used only in cases where the lower complexities are absolutely necessary due to the sizes of arrays (O(n) vs. (9(200)) • std: :map often causes several cache misses per each element access and iteration, so it should be used only if absolutely necessary Note that objects dynamically allocated at similar time (for example one after another) will usually be given addresses one after another, so cache misses will not always happen as much as expected • More info here: http://ithare.com/ c-performance-common-wisdoms-and-common-wisdoms/ Jan Dugacek 6. Templates and complexity Templates Complexity STL containers Cache miss Homework • Write a class that works like an XML tag, contains several named attributes, other tags or text, with methods to add new attributes and tags inside, access their contents and remove them and also has a method to write the tag along with the tags and attributes it contains into a document; use efficient STL containers 9 Challenge: Give it also a constructor that can be used to parse an XML document • About XML: https://en.wikipedia.org/wiki/XML Advanced homework: • Same as the regular one, but give it also a constructor that can be used to parse an XML document and deal with the few characters that need to be escaped Jan Dugacek 6. Templates and complexity