Implementing an Interpreter in C++ Petr Ročkai Implementing an Interpreter in C++ 1/106 May 4, 2017 Organisation • theory: ~50 minutes every two weeks • coding: all the remaining time Assignments • 2 weeks → 1 topic → 1 assignment • you should get most of the work done during the seminar • assignments include writing tests! • on my desk (in email, git, …) by 8am on even Wednesdays Grading • you pass if you implement a game of tic-tac-toe running in the interpreter you implemented Implementing an Interpreter in C++ 2/106 May 4, 2017 Your Own Programming Language (in 6 easy steps) • Lexers and Parsers • Symbol Tables • Evaluating Expressions • Type Checking (*) • Memory Management (*) • Talking to the Outside World Organisation (cont’d) • seminar attendance is optional • you may skip starred topics if you have trouble keeping up • but only if you attended 5 seminars per topic skipped Implementing an Interpreter in C++ 3/106 May 4, 2017 What You Need To Know • we will use C++ 11 (or better) • version control of your choice • UNIX strongly recommended • write automated tests (eg. shell scripts) Implementing an Interpreter in C++ 4/106 May 4, 2017 Part 1: Lexers and Parsers Implementing an Interpreter in C++ 5/106 May 4, 2017 Lexical Structure • the source code is ASCII (Unicode) text • working one character at a time is not fun • lexer converts text into a stream of tokens Token Categories • keywords • identifiers • literals: strings and numbers • operators • brackets Implementing an Interpreter in C++ 6/106 May 4, 2017 Lexer is a Finite State Automaton • the token structure is regular • example: an identifier is [a-zA-Z][a-zA-Z0-9]* • another one: a number is [0-9][0-9]* • needs to deal with whitespace between tokens, too Lexer • reads characters from the input file • outputs tokens for future processing Implementing an Interpreter in C++ 7/106 May 4, 2017 Token • represented by a data type • remembers the text and category • also where it came from struct Token { std::string text; int lineno; enum Cat { If, Else, Endif, Identifier, ParenOpen, ParenClose, LitString, LitNumber } cat; }; Implementing an Interpreter in C++ 8/106 May 4, 2017 struct Lexer { Lexer( const char *filename ); Token next(); /* main interface */ protected: std::ifstream in; std::string buf; /* state machine */ Token identifier(); Token literal(); /* ... */ }; Implementing an Interpreter in C++ 9/106 May 4, 2017 The next function reads and returns the next token Token Lexer::next() { whitespace(); buf += ( c = in.get() ); if ( std::isalpha( c ) ) return identifier(); switch ( c ) { case '=': // ... } } Implementing an Interpreter in C++ 10/106 May 4, 2017 The state machine could be implemented by using one method for each automaton state: Token Lexer::identifer() { char c; while ( std::isalnum( c = in.get() ) ) buf += c; in.unget(); if ( is_keyword( buf ) ) return // ... return Token( Token::Identifier, buf, ... ); } next() from previous slide is the initial state Implementing an Interpreter in C++ 11/106 May 4, 2017 Parsers • typically a context-free language • terminals (symbols) are the tokens • a stack (or recursion) is required for parsing • selection of algorithms (LL, LR, GLR, monadic, rec. descent) • different trade-offs Context • parser reads tokens from the lexer • creates an Abstract Syntax Tree (AST for short) Implementing an Interpreter in C++ 12/106 May 4, 2017 Expressions: Prefix (eg. LISP) • easy to parse, hard-ish to read, annoying to write • variadic operators • (+ 1 2 (* 3 4) 5) Postfix (eg. PostScript) • very easy to parse, hard to read, easy-ish to write • unambiguous even without parens • 3 4 * 1 + 2 + 5 + Infix (eg. everybody else) • hard to parse, easy to read, easy-ish to write • 1 + 2 + 3 * 4 + 5 Implementing an Interpreter in C++ 13/106 May 4, 2017 Abstract Syntax Tree • internal representation of the source code • tree with different node types if ( x + 1 = 5 ) print "hello" if = + 5 𝑥 print "hello" 1 • reflects the structure of the (context-free) grammar Implementing an Interpreter in C++ 14/106 May 4, 2017 AST in C++ • use std::shared_ptr for children • don’t overdo it (many things can be kept as values) template< typename T > using Ptr = std::shared_ptr< T >; struct Expression { /* ... */ }; struct Statement { /* ... */ }; struct If : Node { Ptr< Expression > condition; Ptr< Statement > body; }; Implementing an Interpreter in C++ 15/106 May 4, 2017 AST: Representing Alternatives • you can use std::variant (since C++17) • or use Union from brick-types (see study materials in IS) • or use enums and write out switches by hand (eww) struct Expression; using Atom = Union< Identifier, Literal >; struct Binary { Ptr< Expression > left, right; }; using ExpBase = Union< Atom, Binary, Unary > {}; struct Expression : ExpBase { using ExpBase::Union; }; using Statement = Union< Expression, Block, If >; Implementing an Interpreter in C++ 16/106 May 4, 2017 Parsing: Context-Free Languages • grammars with one-to-some rules • alternatively: stack machines • the goal is reconstructing a grammar derivation • grammars are often ambiguous • subsets of CFLs: LL(1), LR(1), LALR(1), … • can be parsed more efficiently • eg. limited lookahead, no or limited backtracking Implementing an Interpreter in C++ 17/106 May 4, 2017 Parsing: Recursive Descent • parse LL(k) languages in linear time • easy to write in direct C or C++ • no fancy generators needed (ie. no yacc nor bison) Method • look at one token and the grammar rules • find which rules could have produced this token • if there’s only one, you know which rule to pursue • otherwise the grammar is not LL(1) • you can try looking at two tokens instead: LL(2) Implementing an Interpreter in C++ 18/106 May 4, 2017 Recursive Descent in C++ struct Parser { Lexer lexer; Token tok; Toplevel toplevel(); Call call(); Identifier identifier(); Ptr< Expression > expression(); }; • each non-terminal gets a function (more or less) • each function returns the corresponding AST node Implementing an Interpreter in C++ 19/106 May 4, 2017 Ptr< Expression > Parser::expression() { if ( tok.cat == Token::Identifier ) return make_expr( identifier() ); /* ... */ if ( tok.cat != ParenOpen ) fail( "opening paren" ); shift(); if ( tok.cat == Token::Identifier ) return make_expr( call() ); } • looks a bit like the lexer • shift() grabs the next token into tok Implementing an Interpreter in C++ 20/106 May 4, 2017 Parsing: Reporting Errors • LL(1) parsers can easily give nice error messages • what you found vs what you expected to find • the Token remembers where it came from Example: • parse error at [LitString "bar" at line 3], • expected an operator, identifier, if, while or let Implementing an Interpreter in C++ 21/106 May 4, 2017 Assignment (weeks 1 & 2) • come up with decent syntax (could be LISP-like) • conditionals, loops, expressions, variables & functions • create corresponding AST for your language • write a lexer and a parser to generate the AST • write a pretty-printer for the AST • write a dozen or so small example programs • add a script to check that parse + prettyprint is idempotent Due 8th of March, 8am! Implementing an Interpreter in C++ 22/106 May 4, 2017 Assignment Hints • use prefix expressions (saves a lot of time) • straight LISP-like syntax is LL(1) • think about the grammar before writing too much code • think about what you need in a programming language • don’t forget about local variables • use C++ facilities (vectors, maps, sets) whenever useful • don’t lose much sleep over parsing speed • you can find inspiration in ex-parser.tar.gz in the IS Implementing an Interpreter in C++ 23/106 May 4, 2017 Part 2: Symbol Tables Implementing an Interpreter in C++ 24/106 May 4, 2017 Lexical Scoping • this is the contemporary norm • alternative: dynamic scope (shell, elisp) • alternative: no local variables Symbol Tables • keep track of what is in scope • offer efficient lookup of definitions • possibly also keep track of values Implementing an Interpreter in C++ 25/106 May 4, 2017 From Identifiers to Integers • string comparison is slow • the set of identifiers in a program is static • we can assign a unique number to each identifier For example: • put all identifiers in a hashset or a search tree • assign numbers in iteration order • build a number → identifier (string) map Implementing an Interpreter in C++ 26/106 May 4, 2017 Lexical Scopes • the global scope is shared by everything • scopes can be nested int global; void foo() { int local; if ( int z = local + global ) printf( "z is not zero: %d\n", z ); /* z no longer defined here */ } /* 'local' is no longer defined here */ • scope nesting is rigid and does not change at runtime Implementing an Interpreter in C++ 27/106 May 4, 2017 Lexical Scoping: Implementation • every lexical scope gets a (static) symbol table • symbol tables get references to their parents • if a symbol is not found, the table asks its parent scope int Scope::lookup( int id ) { if ( idmap.find( id ) == idmap.end() ) return parent.lookup( id ); /* ... */ } Implementing an Interpreter in C++ 28/106 May 4, 2017 Static Checks • correct syntax does not mean the program is well-formed • variables must be defined before they are used • functions must be defined before they are called • (we will deal with type checking later) • symbol tables are how these checks are done Implementing an Interpreter in C++ 29/106 May 4, 2017 Execution Stack • functions call other functions (or themselves) • the interpreter needs to keep track of this • may consist of pointers to AST nodes • if variables are mutable, keeps track of their values void g( int x ) { g( x + 1 ); } void f() { int y; g( 3 ); } int main() { f(); } f(), y = 5 g(), x = 3 g(), x = 4 main() Implementing an Interpreter in C++ 30/106 May 4, 2017 Mutable Variables • each activation record needs a copy of the value − activation record = stack frame • option one: index stack frames by identifiers − less efficient, easier to implement • option two: pre-compute a fixed layout for frames − store variable offsets in the static symbol table − more efficient but more work to implement Implementing an Interpreter in C++ 31/106 May 4, 2017 Dynamic Scope • in lexical scoping, the parent is the enclosing block • if the scope parent is the caller, you get dynamic scoping • the scope lookup proceeds along the execution stack • sometimes quite powerful, usually very confusing Examples • shell variables • perl optionally (only some variables) • old LISPs (including emacs lisp) • Common Lisp optionally Implementing an Interpreter in C++ 32/106 May 4, 2017 Lexical Closures • you may want to allow local function definitions • a bit like C++ lambda expressions • capture the lexical scope at the point of definition • carry the scope (symbol table) around void f( std::vector< int > &vec ) { std::for_each( vec.begin(), vec.end(), [&]( int x ) { std::cout << vec.front() - x << std::endl; } ); } Implementing an Interpreter in C++ 33/106 May 4, 2017 Lexical Closures: Lifetime • C++ lambdas capture by name or by reference • if a reference-captured value goes out of scope, SIGSEGV • in “dynamic” languages, this is usually different − reference-captured values live as long as needed − even if their original scope is gone − you need a garbage collector to do this • capture by reference is usually more useful − in imperative languages, that is Implementing an Interpreter in C++ 34/106 May 4, 2017 Walking the AST • use recursion to visit children of a node • use type-based matching from Union where appropriate expr.match( [&]( IfLike &stmt ) { recurse( stmt.condition ); }, [&]( DefLike &stmt ) { recurse( stmt.body ); }, /* ... */ ); • first pass builds the symbol tables • second pass checks that all identifier uses are correct Implementing an Interpreter in C++ 35/106 May 4, 2017 Symbol Tables: Summary • static table for each lexical unit (function, block) − ensure functions and variables are in scope when used − possibly store auxiliary data (frame offsets) • values are stored somewhere else (execution stack) − can use std::map from identifiers to values Implementing an Interpreter in C++ 36/106 May 4, 2017 Assignment (weeks 3 & 4) • design and implement a symbol table data structure • implement string → integer key mapping for identifiers • write code to build all symbol tables from the AST • check that all variables are in scope when used − print an error message otherwise • figure out how to store values (at least integers and strings) • write tests for everything above Implementing an Interpreter in C++ 37/106 May 4, 2017 Assignment Hints • don’t forget to use std::map and/or std::unordered_map • take advantage of pattern matching in Union • you can print symbol tables and use text comparison again • try attaching local symbol tables to AST nodes • ideally, a symbol table applies to one node + all its children • sorry, no code hints this time, you did too well on parsers :-) Implementing an Interpreter in C++ 38/106 May 4, 2017 Part 3: Evaluating Expressions Implementing an Interpreter in C++ 39/106 May 4, 2017 Overview • values and variables • evaluation order • recursive evaluators • RPN evaluators Implementing an Interpreter in C++ 40/106 May 4, 2017 Evaluator • an expression evaluator is the heart of an interpreter Roles • arithmetic and other elementary operations • variable lookup • function calls and argument passing • control flow Implementing an Interpreter in C++ 41/106 May 4, 2017 Representing Values • easy if all you have is integers • otherwise, disjoint unions could work • also useful for run-time type checking Alternative (advanced) • raw data (C unions) with type erasure • needs a solid static type system Alternative • objects (as in OOP) Implementing an Interpreter in C++ 42/106 May 4, 2017 From Symbols to Values • expressions without variables are boring • symbol tables to the rescue L-values and R-values • ordinary variable use is R-value use • a variable reference is replaced by its current value • does not work for assignment (mutable variables) • L-value stands for the address of a variable Implementing an Interpreter in C++ 43/106 May 4, 2017 Evaluation Order • relevant for function application (calls) • also for built-ins (control flow) Normal • expand the body first • substitute un-evaluated arguments • also known/implemented as: call by name, lazy Strict • compute argument values first • also known/implemented as: call by value, eager Implementing an Interpreter in C++ 44/106 May 4, 2017 (Mostly) Imperative Programming • call by value • call by name (thunks) • call by reference (pointers) • call by object reference (call by sharing) • call by value result (by value return) • call by need (lazy) • call by macro expansion (text-based) • call by future (concurrent) Implementing an Interpreter in C++ 45/106 May 4, 2017 Thunks int f() { std::cerr << "!"; return 3; } int strict( int value ) { return value + value; } int normal( std::function< int() > value ) { return value() + value(); } int main() { std::cerr << strict( 3 + f() ); std::cerr << normal( []{ return 3 + f(); } ); } Implementing an Interpreter in C++ 46/106 May 4, 2017 Evaluation Order in C++ • almost all expressions are eager • logical operators are lazy / “short circuiting” • statements (if) are “lazy” • promise/futures for lazy evaluation • std::future / std::async with std::launch::deferred • basically an explicit, type-safe thunk Implementing an Interpreter in C++ 47/106 May 4, 2017 Flexibility in Evaluation Order • lazy values in a strict language → usually easy • very easy with first-class functions • including infinite data structures (co-data) On the Other Hand • strict values in a lazy language → usually hard • typically needs language support • often very far from intuitive • compare normal forms: beta, beta-eta, head, weak head • Haskell: seq, deepseq, NFData, $!, BangPatterns Implementing an Interpreter in C++ 48/106 May 4, 2017 Implementation: Recursive Evaluation • the simplest method • works directly on the AST • may not need an explicit execution stack • also the slowest Value eval( If &e_if ) { if ( eval( e_if.condition ) ) return eval( e_if._then ); else return eval( e_if._else ); } Implementing an Interpreter in C++ 49/106 May 4, 2017 Reverse Polish Notation (RPN) • faster than recursive • only useful with eagerly evaluated constructs • good for arithmetic-heavy programs • recall postfix syntax from part 1 • (5 + 3) * x written as 5 3 + x * • trivial evaluation on an explicit stack Implementing an Interpreter in C++ 50/106 May 4, 2017 RPN: Implementation void eval() { if ( size() == 1 ) return; Value a = pop(), b = pop(); Op op = pop(); if ( op == Add ) push( a + b ); // ... } • the result is the only value left on the stack Implementing an Interpreter in C++ 51/106 May 4, 2017 RPN: Control Flow • control flow in an RPN evaluator is a bit tricky • normally every “operator” is strict However • lazy semantics in a strict language? thunks • push thunks for then/else branches onto the RPN stack • profit Function calls? Implementing an Interpreter in C++ 52/106 May 4, 2017 Three-Address Code • might be faster than RPN • control flow is straightforward • ~halfway to a compiler • data stored in arrays (not stacks) • a lot more complicated than RPN • quite some room for optimisation Implementing an Interpreter in C++ 53/106 May 4, 2017 Trampolines • execute continuation-passing-style programs • converts CPS into standard call/return semantics • more of a compiler technique Implementing an Interpreter in C++ 54/106 May 4, 2017 Keeping Track of Calls void g( int x ) { g( x + 1 ); } void f() { int y; g( 3 ); } int main() { f(); } f(), y = 5 g(), x = 3 g(), x = 4 main() Implementation Strategies • meta-circular (in a recursive evaluator) • re-use the explicit RPN evaluation stack • explicit “evaluation context” stack Implementing an Interpreter in C++ 55/106 May 4, 2017 Assignment (weeks 5 & 6) • write an evaluator for your language • arithmetic, conditionals, loops, variables and function calls • mutable variables and an assignment operator • write arithmetic- and recursion-based tests • lexical closures are optional Due 5th of April, 8am! Implementing an Interpreter in C++ 56/106 May 4, 2017 Assignment Hints • a recursive evaluator is the simplest to implement • strict evaluation order is the simplest • you can keep variable values in an std::unordered_map • RPN evaluation is also nice (don’t forget about thunks) • hybrids are viable (recursion only for calls & control flow) Implementing an Interpreter in C++ 57/106 May 4, 2017 Part 4: Type Checking Implementing an Interpreter in C++ 58/106 May 4, 2017 Overview • what is a type • sub-typing • dynamic types (run-time checking) • static types (ahead of time) • classes and objects Implementing an Interpreter in C++ 59/106 May 4, 2017 Why Types? • same reason as syntax checkers • programmers (= people) make mistakes • type mismatch is, usually, a mistake • types = high-powered version of dimension analysis • you don’t want to add seconds to meters by mistake • hence, type discipline and enforcement Implementing an Interpreter in C++ 60/106 May 4, 2017 What is a Type? • first approximation: a set of values • set of integers, set of strings, etc. • every value belongs to a (single) type • both values and variables have types Function Types? • eg. a set of maps from integers to integers • maps are still sets, so this (almost) works out Implementing an Interpreter in C++ 61/106 May 4, 2017 Well-Typed Programs • all type constraints are satisfied • in particular, on function (operator) applications • let 𝑓 :: 𝑇 → 𝑇, 𝑥 :: 𝑇 and 𝑦 :: 𝑈 • f x is well-typed, f y is not • also: assignment and initialisers, pattern matching std::string x = 0.5; /* not well typed */ Implementing an Interpreter in C++ 62/106 May 4, 2017 Products and Sums • cartesian product of two types is again a type • so is a sum (union, or maybe a disjoint union) • unions + products form the basis of algebraic data types • function type is a special subset of the product type Multi-parameter functions • (𝑇 × 𝑇) → 𝑇 is what C/C++ use • 𝑇 → (𝑇 → 𝑇) is what Haskell uses • the two are isomorphic (think curry/uncurry) Implementing an Interpreter in C++ 63/106 May 4, 2017 Product Types: Aggregates • C struct is a typical product type • a more “obvious” example: std::pair and std::tuple • products with named fields are usually very important • (also known as records) • they form the backbone of user-defined types • (classes are based on product types) Implementing an Interpreter in C++ 64/106 May 4, 2017 Subtyping • maybe there’s a user type shape • every circle is clearly also a shape • subtypes correspond to subsets • induces a (pre)order relation on types Contravariance • let 𝑇 be a type and 𝑆 its subtype • whenever a 𝑇 is expected, 𝑆 can be provided • this usual behaviour is called covariant • however! 𝑇 → 𝑆 is a subtype of 𝑆 → 𝑆 • function arguments are contravariant wrt. subtyping Implementing an Interpreter in C++ 65/106 May 4, 2017 Polymorphism • monomorphic function types are quite constraining • eg: plain C functions • think int min( int a, int b ) … how about float? • counter-eg.: C++ function templates • “types = sets” is no longer good enough Approaches • parametric: eg. Hindley-Milner • ad-hoc: like parametric but dirtier (think C++ templates) • subtyping + optionally late binding Implementing an Interpreter in C++ 66/106 May 4, 2017 Parametric Polymorphism • one implementation, multiple (parametric) types • ML, Haskell, etc. (based on Hindley-Milner) • adds type variables • id :: a -> a is good for any type a • type checking is only a little harder than monomorphic • C++ templates (w/o specialisation) are an approximation • can be extended with type classes (Haskell) • min :: (Ord a) => a -> a -> a Implementing an Interpreter in C++ 67/106 May 4, 2017 Algebraic Data Types (revisited) • products and sums are nice but relatively weak • how about recursive (infinite) data types? • allows encoding lists, trees and other inductive types • may also allow encoding co-data types • data List = Nil | Cons Int List • values must contain pointers Parametric ADTs • also: much more powerful with type variables • data List a = Nil | Cons a List Implementing an Interpreter in C++ 68/106 May 4, 2017 Static Type Checking • all type enforcement is done at compile/load time • type information can be erased (more efficient execution) • may require explicit type annotation (as in C, C++98) • or be partially inferred (modern Haskell, C++11 and later) • or be completely inferred (“classical” Haskell) • type errors show up early • may allow static (fast) type-based dispatch Implementing an Interpreter in C++ 69/106 May 4, 2017 Dynamic Type Checking • type enforcement is (mostly) done at runtime • values carry along their types encoded as data • function application also runs the type checker • RTTI could be as little as a couple of bits (LISP) • or as much as a full machine pointer (OOP) Implementing an Interpreter in C++ 70/106 May 4, 2017 Classes and Objects • subtyping naturally leads to OOP • extends types with methods and encapsulation • optionally with late binding • one signature, multiple types, multiple implementations • primarily a problem decomposition tool • also neatly solves namespace problems • works with static (C++) and dynamic (Python) type checkers Implementing an Interpreter in C++ 71/106 May 4, 2017 Late Binding • supertype methods can be overridden in subtypes • different implementations for different types • form of run-time, type-based dispatch • incompatible with (completely) static types • in C++ realised through vtable pointers Implementing an Interpreter in C++ 72/106 May 4, 2017 Type Casting and Coercion • sometimes you know you are right • even though the types don’t match • casts convert from one type to another • coercion simply re-interprets the value • both more-or-less break type safety • C has some arcane implicit casting rules Implementing an Interpreter in C++ 73/106 May 4, 2017 Assignment (weeks 7 & 8): Static Variant • allow user-defined product types with named fields • implement monomorphic function types • add type annotations to the parser & AST • type-check each function application at load time Assignment: Dynamic/OOP Variant • allow user-defined classes (with attributes and methods) • pass values in the evaluator as references to objects • implement late binding (type-based dispatch) • detect failing method lookups at runtime Due 19th of April, 8am! (optional) Implementing an Interpreter in C++ 74/106 May 4, 2017 Assignment Hints • both variants need parser extensions • dynamic types are easier to work with (from user POV) • static types are safer and get you faster code • static type checker builds on the semantic checker • dynamic type checker builds on the evaluator • you can mix & match aspects of both (like C++) • it’s OK to put types and variables in a single namespace Implementing an Interpreter in C++ 75/106 May 4, 2017 Part 5: Memory Management Implementing an Interpreter in C++ 76/106 May 4, 2017 Overview • what lives in memory • reference counting • mark and sweep • copying collectors (compacting) • Cheney on the M.T.A. • generational collection • latency and concurrency Implementing an Interpreter in C++ 77/106 May 4, 2017 What is in program’s memory? • scalar data and arrays of scalars • data structures with pointers in them Pointers: Good and Bad • pointer dereferences are expensive • allows encoding all sorts of structure • lists, trees, graphs • very useful for building abstractions Implementing an Interpreter in C++ 78/106 May 4, 2017 From Flat Memory to Objects • imagine a node in a linked list • it lives somewhere • how do you decide where to put one? • enter malloc and free Semi-Automatic Memory Management • malloc finds a good place to put data • free marks a bit of memory for re-use Implementing an Interpreter in C++ 79/106 May 4, 2017 Building the Abstraction Tower • malloc/free give us abstract-ish objects • but we still need to track lifetime manually • and worse, place free calls statically • this is tedious and not always possible Automatic Garbage Collection • figure out which objects are alive (and which dead) • we no longer need to call free • free is dynamically performed by the GC Implementing an Interpreter in C++ 80/106 May 4, 2017 Basic Idea: Reachability along Pointers • pick a root set of live objects • could be the C stack + registers • or the active (executing) frame • live = reachable from the root set • dead = everything else Implementing an Interpreter in C++ 81/106 May 4, 2017 First Approximation: Reference Counting • along with each object, keep a counter • when a pointer is created/copied, increase the counter • when a pointer is lost, decrease the counter • when the counter hits zero, free the object Problems • expensive to take/copy pointers (memory write) • fails to free object cycles Advantages • low/predictable latency • reasonable memory overhead Implementing an Interpreter in C++ 82/106 May 4, 2017 Garbage Collectors • add a collector procedure • the rest of the program is called the mutator • run the collector at convenient times (not too often) Dealing with Loops: Mark & Sweep • the collector executes reachability along pointers • marking every reachable object • then iterating over all objects • calling free on the unmarked ones (sweeping) Implementing an Interpreter in C++ 83/106 May 4, 2017 Challenges • the collector procedure may need to allocate memory • all mutator threads may need to stop while the collector runs • the collector needs to know which words are pointers • → problems with foreign function interfaces (C calls) • performance under memory pressure Implementing an Interpreter in C++ 84/106 May 4, 2017 Mark & Sweep: Advantages • comparatively easy to implement • low memory overhead • can re-use existing malloc/free • approximate (conservative) collection is possible Disadvantages • high/messy latency (bad for interactive programs) • more memory used = slower collection • interacts badly with concurrency Implementing an Interpreter in C++ 85/106 May 4, 2017 A Copying Collector • split memory into 2 halfspaces • one is the working set, other is dormant • bump allocation of new memory • collect when the live halfspace fills up Collection • copy live objects to the other halfspace • updating all pointers along the way Implementing an Interpreter in C++ 86/106 May 4, 2017 Cheney’s Algorithm • look at a from-space object • copy it over to the to-space • replace the from-space copy with a forwarding pointer • recurse/update pointers in the to-space copy • (not actually implemented recursively) Implementing an Interpreter in C++ 87/106 May 4, 2017 Copying Collectors: Advantages • fast memory allocation • no time spent dealing with garbage • keeps data physically close together • possibly improving cache utilisation Disadvantages • needs exact information about pointers • poor memory utilisation (always 1 empty halfspace) Implementing an Interpreter in C++ 88/106 May 4, 2017 Cheney on the M.T.A. • all allocation is done on the C stack • when the stack is about to fill up: • make a new stack and “Cheney” data from the old one • the program is compiled into C • the compiled functions never return • easy integration with C calls Disadvantage: same as “normal” copying collector Implementing an Interpreter in C++ 89/106 May 4, 2017 Compromises: Generational Collectors • observation: many objects only live for a short while • split memory into a hatchery and a mature space • use a different collector for each • typical: mark & copy for the hatchery (minor collection) • mark & sweep for the mature space (major collection) • the hatchery is traced much more often Implementing an Interpreter in C++ 90/106 May 4, 2017 Generational Collectors: Advantages • short-lived objects are quickly eliminated • hot data is kept together (good for CPU caches) • minor collection is fast & predictable (wrt. latency) • foreign objects can live in the (non-moving) mature space Disadvantages • more complicated • does not fix all the problems Implementing an Interpreter in C++ 91/106 May 4, 2017 A Note on Latency: Incremental Collection • latency in interactive applications is bad • even more so in real-time systems • also in distributed computations • interleave the mutator and the collector • incremental collector can be made real-time • (by imposing a deadline on the increment) • tricky, but easier than concurrent collection Implementing an Interpreter in C++ 92/106 May 4, 2017 Concurrent Collectors • concurrent data structures are hard • not freeing dead objects is not a big problem • (they will be picked up by a later cycle) • freeing live objects is a big problem • needs cooperation from mutator threads • easy-ish with reference counting Eg. http://www.aicas.com/papers/ismm02f-siebert.pdf Implementing an Interpreter in C++ 93/106 May 4, 2017 Assignment (weeks 9 & 10): • implement a garbage collector • (optional, deadline May the 3rd, 8am) Implementing an Interpreter in C++ 94/106 May 4, 2017 Part 6: Talking to The Outside World Implementing an Interpreter in C++ 95/106 May 4, 2017 Overview • foreign function interface (FFI) • constructing calls • dealing with memory & outputs • aggregate arguments and return values • a simple runtime-only solution Implementing an Interpreter in C++ 96/106 May 4, 2017 Foreign Function Interface • a mechanism for calling procedures • defined in a language different from our own • a typical target language is C • crucial for re-use of existing code Implementing an Interpreter in C++ 97/106 May 4, 2017 Constructing Calls • problem: we need to call a function • but we don’t know what its arguments are Some options: • ad-hoc: hard-code some argument combinations • template metaprogramming • automatic code generation • re-implement the C calling convention Implementing an Interpreter in C++ 98/106 May 4, 2017 An Ad-Hoc Approach • good enough to cover most syscalls • not good to talk to libraries void ccall( void (*f)(), ArgT argt, ArgV v ) { if ( argt == ArgT{ Int } ) return f( v[ 0 ].asInt() ); if ( argt == ArgT{ Int, Int } ) return f( v[ 0 ].asInt(), v[ 1 ].asInt() ); /* ... */ } Implementing an Interpreter in C++ 99/106 May 4, 2017 Template Metaprogramming • use variadic/recursive function templates • automates data conversion (the asInt() bit) • required instances need to be known at compile time • does not really fix the problem int f( int a, int b, const char *c ); auto tup = std::make_tuple( 3, 7, "foo" ); brick::tuple::pass( f, tup ); // call f( 3, 7, "foo" ) Implementing an Interpreter in C++ 100/106 May 4, 2017 Automatic Code Generation • uses specific, per-function wrappers • the wrappers are generated as C (C++) code • may use the template approach to simplify generated code Value wrap_write( std::vector< Value > args ) { int rv = write( args[ 0 ].asInt(), args[ 1 ].asString(), args[ 2 ].asInt() ); return Value( rv ); } Implementing an Interpreter in C++ 101/106 May 4, 2017 The C Calling Convention • architecture-specific (x86 is simple, amd64 is complex) • the generic ccall needs to be written in assembly • most compact but least portable x86/cdecl • arguments go onto the stack (right-to-left) • scalar return values either in eax or st0 amd64: a 10 page spec on what goes where Implementing an Interpreter in C++ 102/106 May 4, 2017 Dealing with Outputs • some functions return variable-sized data • like the read function • using output arguments (represented by pointers) • such arguments must be treated differently • the output of read should give us an in-language string char buffer[32]; read( 0, buffer, 32 ); // buffer contains the data we want Implementing an Interpreter in C++ 103/106 May 4, 2017 Aggregate Values • C supports passing structures as arguments • and returning them as values • will not work with the ad-hoc approach • too many different sizes struct foo { int x, y, z; double bar; }; foo update_foo( foo x ) { ... } Implementing an Interpreter in C++ 104/106 May 4, 2017 A Simple Approach • usable with ad-hoc call construction • does not need to invoke a compiler • looks up functions by using dlsym int main() { int (*w)() = dlsym( NULL, "write" ); w( 1, "hello world\n", 12 ); } Implementing an Interpreter in C++ 105/106 May 4, 2017 How to Construct Wrappers • option 1: reconstruct from calls • will not work for variadic functions (printf) • option 2: special syntax for declaring C functions • you rely on the user to translate the prototypes • option 3: parse C headers (hard) (define fun (foreign-lambda c-string "fun" c-string int)) Implementing an Interpreter in C++ 106/106 May 4, 2017 Assignment • implement a simple C-based FFI for your interpreter • must be able to call integer-argument functions w/ up to 4 args • it must be able to deal with read or similar • use the FFI to get I/O capabilities • implement an interactive game of tic-tac-toe There is no deadline.