Implementing an Interpreter in C++ Petr Ročkai Organisation • theory: ~50 minutes every two weeks • coding: all the remaining time Assignments • 2 weeks -> 1 topic -»1 assignment (6 total) • you should get most of the work done during the seminar • assignments include writing tests! • you have 14 days to hand in your assignment Implementing an Interpreter in C++ 2/109 May 17, 2018 Grading • each assignment is worth 2 points • implementing a game of tic-tac-toe is worth lOpt — it has to run in the interpreter you implemented • attendance is optional: showing up is worth .2pt • missing a deadline on an assignment is worth -.5pt • you need to collect 20 points Implementing an Interpreter in C++ 3/109 May 17, 2018 Your Own Programming Language (in 6 easy steps) • Lexers and Parsers • Symbol Tables • Evaluating Expressions • Type Checking • Memory Management • Talking to the Outside World Implementing an Interpreter in C++ 4/109 May 17, 2018 What You Need To Know • we will use C++14 • version control of your choice • UNIX strongly recommended • write automated tests (eg. shell scripts) Implementing an Interpreter in C++ 5/109 May 17, 2018 Parti: Lexers and Parsers Lexical Structure • the source code is ASCII (Unicode] text • working one character at a time is not fun • lexer converts text into a stream of tokens Token Categories • keywords • identifiers • literals: strings and numbers • operators • brackets Implementing an Interpreter in C++ 7/109 May 17, 2018 Lexer is a Finite State Automaton • the token structure is regular • example: an identifier is [ a - z A - Z ] [ a - z A - Z0 - 9 ] * • another one: a number is [0-9] [0-9]* • needs to deal with whitespace between tokens, too Lexer • reads characters from the input file • outputs tokens for future processing Implementing an Interpreter in C++ 8/109 May 17, 2018 Token • represented by a data type • remembers the text and category • also where it came from struct Token { std::string text; int lineno; enum Cat { If, Else, Endif, Identifier, ParenOpen, ParenClose, LitString, LitNumber } }; cat; Implementing an Interpreter in C++ 9/109 May 17, 2018 struct Lexer { Lexer( const char *filename ); Token next(); /* main interface */ protected: std::ifstream in; std::string buf; /* state machine */ Token identifier(); Token literal(); /* ... */ }; Implementing an Interpreter in C++ 10/109 May 17,2018 The next function reads and returns the next to-ken Token Lexer::next() { whitespace(); but += ( c = in.get() ); if ( std::isalpha( c ) ) return identifier(); switch ( c ) { case '=': // ... } } Implementing an Interpreter in C++ 11/109 May 17,2018 The state machine could be implemente one method for each automaton state: Token Lexer::identifer() { char c; while ( std::isalnum( c = in.get() ) ) but += c; in.unget(); if ( is_keyword( buf ) ) return // ... return Token( Token::Identifier, buf, .. next () from previous slide is the initial state Implementing an Interpreter in C++ 12/109 May 17,2018 Parsers • typically a context-free language • terminals (symbols) are the tokens • a stack (or recursion) is required for parsing • selection of algorithms (LL, LR, GLR, monadic, rec. descent) • different trade-offs Context • parser reads tokens from the lexer • creates an Abstract Syntax Tree (AST for short) Implementing an Interpreter in C++ 13/109 May 17, 2018 Expressions: Prefix (eg. LISP) • easy to parse, hard-ish to read, annoying to write • variadic operators • (+ 1 2 (* 3 4) 5) Postfix (eg. PostScript) • very easy to parse, hard to read, easy-ish to write • unambiguous even without parens • 34*1 + 2 + 5 + Infix (eg. everybody else) • hard to parse, easy to read, easy-ish to write • 1 + 2 + 3*4 + 5 Implementing an Interpreter in C++ 14/109 May 17, 2018 Abstract Syntax Tree • internal representation of the source code • tree with different node types if ( x + 1 = 5 ) print "hello" pickup pencircle scaled .4mm ; circleit.n_if ( btex \tt\orange if\strut etex ) ; circleit.n_eq ( btex \orange $=$\strut etex ) ■ circleit.n_plus ( btex \orange $+$\strut etex ) circleit n_x ( circleit none ( Implementing an Interpreter in C++ btex $x$\strut etex ) ; btex \darkqreen $l$\strut etex 15/109 May 17, 2018 AST in C++ • use std : : sha red_pt r for children • don't overdo it (many things can be kept as values] template< typename T > using Ptr = std::shared_ptr< T >; struct Expression { /* ... */ }; struct Statement { /* ... */ }; struct If : Node { Ptr< Expression > condition; Ptr< Statement > body; }; Implementing an Interpreter in C++ 16/109 May 17, 2018 AST: Representing Alternatives • you can use std: : variant (since C++17) • or use Union from brick-types (see study materials in IS) • or use enums and write out switches by hand (eww) struct Expression; using Atom = Union< Identifier, Literal >; struct Binary { Ptr< Expression > left, right; }; using ExpBase = Union< Atom, Binary, Unary > {}; struct Expression : ExpBase { using ExpBase::Unio }; using Statement = Union< Expression, Block, If >; Implementing an Interpreter in C++ 17/109 May 17, 2018 Parsing: Context-Free Languages • grammars with one-to-some rules • alternatively: stack machines • the goal is reconstructing a grammar derivation • grammars are often ambiguous • subsets of CFLs: LL(1), LR(1), LALR(l), ... • can be parsed more efficiently • eg. limited lookahead, no or limited backtracking Implementing an Interpreter in C++ 18/109 May 17, 2018 Parsing: Recursive Descent • parse LL(k) languages in linear time • easy to write in direct C or C++ • no fancy generators needed (ie. no yacc nor bison) Method • look at one token and the grammar rules • find which rules could have produced this token • if there's only one, you know which rule to pursue • otherwise the grammar is not LL(1) • you can try looking at two tokens instead: LL(2) Implementing an Interpreter in C++ 19/109 May 17, 2018 Recursive Descent in C++ struct Parser { Lexer lexer; Token tok; Toplevel toplevelO; Call call(); Identifier identifier(); Ptr< Expression > expression!); }; • each non-terminal gets a function (more or less) • each function returns the corresponding AST node Implementing an Interpreter in C++ 20/109 May 17,2018 Ptr< Expression > Parser::expression() { if ( tok.cat == Token::Identifier ) return make_expr( identified) ); /* ... */ if ( tok.cat != ParenOpen ) fail( "opening paren" ); shift(); if ( tok.cat == Token::Identifier ) return make_expr( call() ); } • looks a bit like the lexer • shif t () grabs the next token into tok Implementing an Interpreter in C++ 21/109 May 17,2018 Parsing: Reporting Errors • LL(1) parsers can easily give nice error messages • what you found vs what you expected to find • the Token remembers where it came from Example: • parse error at [LitSt ring "bar" at line 3], • expected an operator, identifier, if, while or let Implementing an Interpreter in C++ 22/109 May 17, 2018 Assignment 1 • come up with decent syntax (could be LISP-like] • conditionals, loops, expressions, variables & functions • create corresponding AST for your language • write a lexer and a parser to generate the AST • write a pretty-printer for the AST • write a dozen or so small example programs • add a script to check that parse + prettyprint is idempo-tent Deadline 9th of March. Implementing an Interpreter in C++ 23/109 May 17, 2018 Assignment 1: Hints • use prefix expressions (saves a lot of time) • straight LISP-like syntax is LL(1) • think about the grammar before writing too much code • think about what you need in a programming language • don't forget about local variables • use C++ facilities (vectors, maps, sets) whenever useful • don't lose much sleep over parsing speed • you can find inspiration in ex-parser, tar.gz in the IS Implementing an Interpreter in C++ 24/109 May 17, 2018 Part 2: Symbol Tables Lexical Scoping • this is the contemporary norm • alternative: dynamic scope (shell, elisp) • alternative: no local variables Symbol Tables • keep track of what is in scope • offer efficient lookup of definitions • possibly also keep track of values Implementing an Interpreter in C++ 26/109 May 17, 2018 From Identifiers to Integers • string comparison is slow • the set of identifiers in a program is static • we can assign a unique number to each identifier For example: • put all identifiers in a hashset or a search tree • assign numbers in iteration order • build a number -> identifier (string) map Implementing an Interpreter in C++ 27/109 May 17, 2018 Lexical Scopes • the global scope is shared by everything • scopes can be nested int global; void foo() { int local; if ( int z = local + global ) printf( "z is not zero: %d\n", z ); /* z no longer defined here */ } /* 'local' is no longer defined here */ • scope nesting is rigid and does not change at runtime Implementing an Interpreters C++ ° 28/109 b May 17,2018 Lexical Scoping: Implementation • every lexical scope gets a (static] symbol table • symbol tables get references to their parents • if a symbol is not found, the table asks its parent scope int Scope::lookup( int id ) { if ( idmap.find( id ) == idmap.endO ) return parent.lookup( id ); /* ... */ } Implementing an Interpreter in C++ 29/109 May 17, 2018 Static Checks • correct syntax does not mean the program is well-formed • variables must be defined before they are used • functions must be defined before they are called • (we will deal with type checking later) • symbol tables are how these checks are done Implementing an Interpreter in C++ 30/109 May 17, 2018 Execution Stac • functions call other functions (or themselves) • the interpreter needs to keep track of this • may consist of pointers to AST nodes • if variables are mutable, keeps track of their values void g( int x ) { g( x + 1 ); } void f0 { int y; g( 3 ); } int main() { f(); } g(), x = 4 Implementing an Interpreter in C++ 31/TL09 May 17, 2018 Mutable Variables • each activation record needs a copy of the value — activation record = stack frame • option one: index stack frames by identifiers — less efficient, easier to implement • option two: pre-compute a fixed layout for frames — store variable offsets in the static symbol table — more efficient but more work to implement Implementing an Interpreter in C++ 32/109 May 17, 2018 Dynamic Scope • in lexical scoping, the parent is the enclosing block • if the scope parent is the caller, you get dynamic scoping • the scope lookup proceeds along the execution stack • sometimes quite powerful, usually very confusing Examples • shell variables • pe rl optionally (only some variables) • old LISPs (including emacs lisp] • Common Lisp optionally Implementing an Interpreter in C++ 33/109 May 17, 2018 Lexical Closures • you may want to allow local function definitions • a bit like C++ lambda expressions • capture the lexical scope at the point of definition • carry the scope (symbol table] around void f( std::vector< int > &vec ) { std: :for_each( vec.beginO, vec.endO, [&] ( int x ) { std::cout « vec.front() - x « std::endl; } ); } Implementing an Interpreter in C++ 34/109 May 17, 2018 Lexical Closures: Lifetime • C++ lambdas capture by name or by reference • if a reference-captured value goes out of scope, SIGSEGV • in "dynamic" languages, this is usually different — reference-captured values live as long as needed — even if their original scope is gone — you need a garbage collector to do this • capture by reference is usually more useful — in imperative languages, that is Implementing an Interpreter in C++ 35/109 May 17, 2018 Walking the AST • use recursion to visit children of a node • use type-based matching from Union where appropriate expr.match( [&] ( If Like &stmt ) { recurse( stmt. conditio ); }, [&]( DefLike &stmt ) { recurse( stmt body ); /*...*/); • first pass builds the symbol tables • second pass checks that all identifier uses are correct Implementing an Interpreter in C++ 36/109 May 17, 2018 Symbol Tables: Summary • static table for each lexical unit (function, block) — ensure functions and variables are in scope when used — possibly store auxiliary data (frame offsets] • values are stored somewhere else (execution stack) — can use std: : map from identifiers to values Implementing an Interpreter in C++ 37/109 May 17, 2018 Assignment 2 • design and implement a symbol table data structure • implement string -»integer key mapping for identifiers • write code to build all symbol tables from the AST • check that all variables are in scope when used — print an error message otherwise • figure out how to store values (at least integers and strings) • write tests for everything above Deadline 23rd of March. Implementing an Interpreter in C++ 38/109 May 17, 2018 Assignment 2: Hints • don't forget to use std: : map and/or std: : unordered_map • take advantage of pattern matching in Union • you can print symbol tables and use text comparison again • try attaching local symbol tables to AST nodes • ideally, a symbol table applies to one node + all its children • sorry, no code hints this time, you did too well on parsers Implementing an Interpreter in C++ 39/109 May 17, 2018 Part 3: Evaluating Expressions Overview • values and variables • evaluation order • recursive evaluators • RPN evaluators Implementing an Interpreter in C++ 41/109 May 17, 2018 Evaluator • an expression evaluator is the heart of an interpreter Roles • arithmetic and other elementary operations • variable lookup • function calls and argument passing • control flow Implementing an Interpreter in C++ 42/109 May 17, 2018 Representing Values • easy if all you have is integers • otherwise, disjoint unions could work • also useful for run-time type checking Alternative (advanced) • raw data (C unions) with type erasure • needs a solid static type system Alternative • objects (as in OOP) Implementing an Interpreter in C++ 43/109 May 17, 2018 From Symbols to Values • expressions without variables are boring • symbol tables to the rescue L-values and R-values • ordinary variable use is R-value use • a variable reference is replaced by its current value • does not work for assignment (mutable variables) • L-value stands for the address of a variable Implementing an Interpreter in C++ 44/109 May 17, 2018 Evaluation Order • relevant for function application (calls) • also for built-ins (control flow) Normal • expand the body first • substitute un-evaluated arguments • also known/implemented as: call by name, lazy Strict • compute argument values first • also known/implemented as: call by value, eager Implementing an Interpreter in C++ 45/109 May 17, 2018 (Mostly) Imperative Programming • call by value • call by name (thunks) • call by reference (pointers) • call by object reference (call by sharing) • call by value result (by value return) • call by need (lazy) • call by macro expansion (text-based) • call by future (concurrent) Implementing an Interpreter in C++ 46/109 May 17, 2018 Thunks int f() { std::cerr « "!"; return 3; } int strict( int value ) { return value + value; } int normal( std::function< int() > value ) { return value() + value(); } int main() { std::cerr « strict( 3 + f() ); std::cerr « normal( []{ return 3 + f(); } ); Implementing an Interpreter in C++ 47/109 May 17,2018 Evaluation Order in C++ • almost all expressions are eager • logical operators are lazy / "short circuiting" • statements (if) are "lazy" • promise/futures for lazy evaluation • std: :future/std: :asyncwithstd: :launch: :deferred • basically an explicit, type-safe thunk Implementing an Interpreter in C++ 48/109 May 17, 2018 Flexibility in Evaluation Order • lazy values in a strict language -» usually easy • very easy with first-class functions • including infinite data structures (co-data) On the Other Hand • strict values in a lazy language -» usually hard • typically needs language support • often very far from intuitive • compare normal forms: beta, beta-eta, head, weak head • Haskell: seq, deepseq, NFData, $!, BangPatterns Implementing an Interpreter in C++ 49/109 May 17, 2018 Implementation: Recursive Evaluation • the simplest method • works directly on the AST • may not need an explicit execution stack • also the slowest Value eval( If &e_if ) { if ( eval( e_if.condition ) ) return eval( e_if._then ); else return eval( e_if._else ); } Implementing an Interpreter in C++ 50/109 May 17, 2018 Reverse Polish Notation (RPN) • faster than recursive • only useful with eagerly evaluated constructs • good for arithmetic-heavy programs • recall postfix syntax from part 1 • (5 + 3) * x written as 5 3 + x * • trivial evaluation on an explicit stack Implementing an Interpreter in C++ 51/109 May 17, 2018 RPN: Implementation void eval() { if ( sizeO == 1 ) return; Value a = pop(), b = pop(); Op op = pop(); if ( 0p == Add ) push( a + b ); // ... } • the result is the only value left on the stack Implementing an Interpreter in C++ 52/109 May 17,2018 RPN: Control Flow • control flow in an RPN evaluator is a bit tricky • normally every "operator" is strict However • lazy semantics in a strict language? thunks • push thunks for then/else branches onto the RPN stack • profit Function calls? Implementing an Interpreter in C++ 53/109 May 17, 2018 Three-Address Code • might be faster than RPN • control flow is straightforward • —halfway to a compiler • data stored in arrays (not stacks] • a lot more complicated than RPN • quite some room for optimisation Implementing an Interpreter in C++ 54/109 May 17, 2018 Trampolines • execute continuation-passing-style programs • converts CPS into standard call/return semantics • more of a compiler technique Implementing an Interpreter in C++ 55/109 May 17, 2018 Keeping Track of Calls void g( int x ) { g( x + 1 ); } void f0 { int y; g( 3 ); } int main() { f(); } g()/X f g()/X f f(),y Implementing an Interpreter in C++ 56^109 May 17,2018 Assignment 3 • write an evaluator for your language • arithmetic, conditionals, loops, variables and function calls • mutable variables and an assignment operator • write arithmetic- and recursion-based tests • lexical closures are optional Deadline 6th of April. Implementing an Interpreter in C++ 57/109 May 17, 2018 Assignment 3: Hints • a recursive evaluator is the simplest to implement • strict evaluation order is the simplest • you can keep variable values in an std: : unordered_map • RPN evaluation is also nice (don't forget about thunks) • hybrids are viable (recursion only for calls & control flow) Implementing an Interpreter in C++ 58/109 May 17, 2018 Part 4: Type Checking Overview • what is a type • sub-typing • dynamic types (run-time checking) • static types (ahead of time) • classes and objects Implementing an Interpreter in C++ 60/109 May 17, 2018 Why Types? • same reason as syntax checkers • programmers (= people) make mistakes • type mismatch is, usually, a mistake • types = high-powered version of dimension analysis • you don't want to add seconds to meters by mistake • hence, type discipline and enforcement Implementing an Interpreter in C++ 61/109 May 17, 2018 What is a Type? • first approximation: a set of values • set of integers, set of strings, etc. • every value belongs to a (single] type • both values and variables have types Function Types? • eg. a set of maps from integers to integers • maps are still sets, so this (almost) works out Implementing an Interpreter in C++ 62/109 May 17, 2018 Well-Typed Programs • all type constraints are satisfied • in particular, on function (operator] applications • let/ :: T -» T,x :: Tandy :: U • f x is well-typed, f y is not • also: assignment and initialisers, pattern matching std::string x = 0.5; /* not well typed */ Implementing an Interpreter in C++ 63/109 May 17, 2018 Products and Sums • cartesian product of two types is again a type • so is a sum (union, or maybe a disjoint union) • unions + products form the basis of algebraic data types • function type is a special subset of the product type Multi-parameter functions • (T x T) -> T is what C/C++ use • T -» (T -» T) is what Haskell uses • the two are isomorphic (think curry/uncurry) Implementing an Interpreter in C++ 64/109 May 17, 2018 Product Types: Aggregates • Cstructisa typical product type • a more "obvious" example: std: : pair and std: : tuple • products with named fields are usually very important • (also known as records) • they form the backbone of user-defined types • (classes are based on product types] Implementing an Interpreter in C++ 65/109 May 17, 2018 Subtyping • maybe there's a user type shape • every circle is clearly also a shape • subtypes correspond to subsets • induces a [pre] order relation on types Contravariance • let T be a type and S its subtype • whenever a T is expected, 5 can be provided • this usual behaviour is called covariant • however! T -> S is a subtype of 5 -» S • function arguments are contravariant wrt. subtyping Implementing an Interpreter in C++ 66/109 May 17, 2018 Polymorphism • monomorphic function types are quite constraining • eg: plain C functions • think int min( int a, int b ) ... how about float? • counter-eg.: C++ function templates • "types = sets" is no longer good enough Approaches • parametric: eg. Hindley-Milner • ad-hoc: like parametric but dirtier (think C++ templates) • subtyping + optionally late binding Implementing an Interpreter in C++ 67/109 May 17, 2018 Parametric Polymorphism • one implementation, multiple (parametric) types • ML, Haskell, etc. (based on Hindley-Milner) • adds type variables • id : : a -> a is good for any type a • type checking is only a little harder than monomorphic • C++ templates (w/o specialisation) are an approximation • can be extended with type classes (Haskell) • min :: (Ord a) => a -> a -> a Implementing an Interpreter in C++ 68/109 May 17, 2018 Algebraic Data Types (revisited) • products and sums are nice but relatively weak • how about recursive (infinite) data types? • allows encoding lists, trees and other inductive types • may also allow encoding co-data types • data List = Nil | Cons Int List • values must contain pointers Parametric ADTs • also: much more powerful with type variables • data List a = Nil I Cons a List Implementing an Interpreter in C++ 69/109 May 17, 2018 Static Type Checking • all type enforcement is done at compile/load time • type information can be erased (more efficient execution) • may require explicit type annotation (as in C, C++98) • or be partially inferred (modern Haskell, C++11 and later) • or be completely inferred ("classical" Haskell) • type errors show up early • may allow static (fast) type-based dispatch Implementing an Interpreter in C++ 70/109 May 17, 2018 Dynamic Type Checking • type enforcement is (mostly) done at runtime • values carry along their types encoded as data • function application also runs the type checker • RTTI could be as little as a couple of bits (LISP) • or as much as a full machine pointer (OOP) Implementing an Interpreter in C++ 71/109 May 17, 2018 Classes and Objects • subtyping naturally leads to OOP • extends types with methods and encapsulation • optionally with late binding • one signature, multiple types, multiple implementations • primarily a problem decomposition tool • also neatly solves namespace problems • works with static (C++) and dynamic (Python) types Implementing an Interpreter in C++ 72/109 May 17, 2018 Late Binding • supertype methods can be overridden in subtypes • different implementations for different types • form of run-time, type-based dispatch • incompatible with (completely] static types • in C++ realised through vtable pointers Implementing an Interpreter in C++ 73/109 May 17, 2018 Type Casting and Coercion • sometimes you know you are right • even though the types don't match • casts convert from one type to another • coercion simply re-interprets the value • both more-or-less break type safety • C has some arcane implicit casting rules Implementing an Interpreter in C++ 74/109 May 17, 2018 Assignment 4: Static Variant • allow user-defined product types with named fields • implement monomorphic function types • add type annotations to the parser & AST • type-check each function application at load time Assignment 4: Dynamic/OOP Variant • allow user-defined classes (with attributes and methods) • pass values in the evaluator as references to objects • implement late binding (type-based dispatch) • detect failing method lookups at runtime Deadline 20th of April. Implementing an Interpreter in C++ 75/109 May 17, 2018 Assignment 4: Hints • both variants need parser extensions • dynamic types are easier to work with (from user POV) • static types are safer and get you faster code • static type checker builds on the semantic checker • dynamic type checker builds on the evaluator • you can mix & match aspects of both (like C++) • it's OK to put types and variables in a single namespace Implementing an Interpreter in C++ 76/109 May 17, 2018 Part 5: Memory Management Overview • what lives in memory • reference counting • mark and sweep • copying collectors (compacting) • Cheney on the M.T.A. • generational collection • latency and concurrency Implementing an Interpreter in C++ 78/109 May 17, 2018 What is in program's memory? • scalar data and arrays of scalars • data structures with pointers in them Pointers: Good and Bad • pointer dereferences are expensive • allows encoding all sorts of structure • lists, trees, graphs • very useful for building abstractions Implementing an Interpreter in C++ 79/109 May 17, 2018 From Flat Memory to Objects • imagine a node in a linked list • it lives somewhere • how do you decide where to put one? • enter malloc and f ree Semi-Automatic Memory Management • malloc finds a good place to put data • f ree marks a bit of memory for re-use Implementing an Interpreter in C++ 80/109 May 17, 2018 Building the Abstraction Tower • malloc/f ree give us abstract-ish objects • but we still need to track lifetime manually • and worse, place f ree calls statically • this is tedious and sometimes impossible Automatic Garbage Collection • figure out which objects are alive (and which dead) • we no longer need to call f ree • f ree is dynamically performed by the GC Implementing an Interpreter in C++ 81/109 May 17, 2018 Basic Idea: Reachability along Pointers • pick a root set of live objects — could be the C stack + registers — or the active (executing) frame • live = reachable from the root set • dead = everything else Implementing an Interpreter in C++ 82/109 May 17, 2018 First Approximation: Reference Counting • along with each object, keep a counter • when a pointer is created/copied, increase the counter • when a pointer is lost, decrease the counter • when the counter hits zero, f ree the object Problems • expensive to take/copy pointers (memory write) • fails to free object cycles Advantages • low/predictable latency • reasonable memory overhead Implementing an Interpreter in C++ 83/109 May 17, 2018 Garbage Collectors • add a collector procedure • the rest of the program is called the mutator • run the collector at convenient times (not too often) Dealing with Loops: Mark & Sweep • the collector executes reachability along pointers • marking every reachable object • then iterating over all objects • calling f ree on the unmarked ones (sweeping) Implementing an Interpreter in C++ 84/109 May 17, 2018 Challenges • the collector procedure may need to allocate memory • mutator threads may need to stop while the collector runs • the collector needs to know which words are pointers • -» problems with foreign function interfaces (C calls) • performance under memory pressure Implementing an Interpreter in C++ 85/109 May 17, 2018 Mark & Sweep: Advantages • comparatively easy to implement • low memory overhead • can re-use existing mall oc/free • approximate (conservative] collection is possible Disadvantages • high/messy latency (bad for interactive programs) • more memory used = slower collection • interacts badly with concurrency Implementing an Interpreter in C++ 86/109 May 17, 2018 A Copying Collector • split memory into 2 halfspaces • one is the working set, other is dormant • bump allocation of new memory • collect when the live halfspace fills up Collection • copy live objects to the other halfspace • updating all pointers along the way Implementing an Interpreter in C++ 87/109 May 17, 2018 Cheney's Algorithm • look at a from-space object • copy it over to the to-space • replace the from-space copy with a forwarding pointer • recurse/update pointers in the to-space copy • (not actually implemented recursively) Implementing an Interpreter in C++ 88/109 May 17, 2018 Copying Collectors: Advantages • fast memory allocation • no time spent dealing with garbage • keeps data physically close together • possibly improving cache utilisation Disadvantages • needs exact information about pointers • poor memory utilisation (always 1 empty halfspace) Implementing an Interpreter in C++ 89/109 May 17, 2018 Cheney on the M.T.A. • all allocation is done on the C stack • when the stack is about to fill up: • make a new stack and "Cheney" data from the old one • the program is compiled into C • the compiled functions never return • easy integration with C calls Disadvantage: same as "normal" copying collector Implementing an Interpreter in C++ 90/109 May 17, 2018 Compromises: Generational Collectors • observation: many objects only live for a short while • split memory into a hatchery and a mature space • use a different collector for each • typical: mark & copy for the hatchery (minor collection) • mark & sweep for the mature space (major collection) • the hatchery is traced much more often Implementing an Interpreter in C++ 91/109 May 17, 2018 Generational Collectors: Advantages • short-lived objects are quickly eliminated • hot data is kept together (good for CPU caches] • minor collection is fast & predictable (wrt. latency) • foreign objects can live in the (non-moving) mature space Disadvantages • more complicated • does not fix all the problems Implementing an Interpreter in C++ 92/109 May 17, 2018 A Note on Latency: Incremental Collection • latency in interactive applications is bad • even more so in real-time systems • also in distributed computations • interleave the mutator and the collector • incremental collector can be made real-time • (by imposing a deadline on the increment) • tricky, but easier than concurrent collection Implementing an Interpreter in C++ 93/109 May 17, 2018 Concurrent Collectors • concurrent data structures are hard • not freeing dead objects is not a big problem • (they will be picked up by a later cycle] • freeing live objects is a big problem • needs cooperation from mutator threads • easy-ish with reference counting Eg. http://www.aicas.com/papers/ismm02f-siebert.pdf Implementing an Interpreter in C++ 94/109 May 17, 2018 Assignment 5: • implement a garbage collector Deadline 24th of May. Implementing an Interpreter in C++ 95/109 May 17, 2018 Part 6: Talking to The Outside World Overview • foreign function interface (FFI) • constructing calls • dealing with memory & outputs • aggregate arguments and return values • a simple runtime-only solution Implementing an Interpreter in C++ 97/109 May 17, 2018 Foreign Function Interface • a mechanism for calling procedures • defined in a language different from our own • a typical target language is C • crucial for re-use of existing code Implementing an Interpreter in C++ 98/109 May 17, 2018 Constructing Calls • problem: we need to call a function • but we don't know what its arguments are Some options: • ad-hoc: hard-code some argument combinations • template metaprogramming • automatic code generation • re-implement the C calling convention Implementing an Interpreter in C++ 99/109 May 17, 2018 An Ad-Hoc Approach • good enough to cover most syscalls • not good to talk to libraries void ccall( void (*f)(), ArgT argt, ArgV v ) { if ( argt == ArgT{ Int } ) return f( v[ 0 ].aslnt() ); if ( argt == ArgT{ Int, Int } ) return f( v[ 0 ].aslnt(), v[ 1 ].aslnt() ); /* ... */ } Implementing an Interpreter in C++ 100/109 May 17, 2018 Template Metaprogramming • use variadic/recursive function templates • automates data conversion (the aslnt () bit] • required instances need to be known at compile time • does not really fix the problem int f( int a, int b, const char *c ); auto tup = std::make_tuple( 3, 7, "too" ); // call f( 3, 7, "too" ) brick::tuple::pass( f, tup ); Implementing an Interpreter in C++ 101/109 May 17, 2018 Automatic Code Generation • uses specific, per-function wrappers • the wrappers are generated as C [C++] code • may use the template approach to simplify generated code Value wrap_write( std::vector< Value > args ) { int rv = write( args[ 0 ].aslnt(), args[ 1 ].asString(), args[ 2 ].aslnt () ); return Value( rv ); } Implementing an Interpreter in C++ 102/109 May 17, 2018 The C Calling Convention • architecture-specific (x86 is simple, amd64 is complex) • the generic ccall needs to be written in assembly • most compact but least portable x86/cdecl • arguments go onto the stack (right-to-left) • scalar return values either in eax or stO amd64: a 10 page spec on what goes where Implementing an Interpreter in C++ 103/109 May 17, 2018 Dealing with Outputs • some functions return variable-sized data • like the read function • using output arguments (represented by pointers) • such arguments must be treated differently • the output of read should give us an in-language string char buffer[32]; read( 0, buffer, 32 ); // buffer contains the data we want Implementing an Interpreter in C++ 104/109 May 17, 2018 Aggregate Values • C supports passing structures as arguments • and returning them as values • will not work with the ad-hoc approach • too many different sizes struct too { int x, y, z; double bar; }; too update_foo( too x ) { ... } Implementing an Interpreter in C++ 105/109 May 17, 2018 A Simple Approach • usable with ad-hoc call construction • does not need to invoke a compiler • looks up functions by using dlsym int main() { int (*w)() = dlsym( NULL, "write" ); w( 1, "hello world\n", 12 ); } Implementing an Interpreter in C++ 106/109 May 17, 2018 How to Construct Wrappers • option 1: reconstruct from calls • will not work for variadic functions (printf] • option 2: special syntax for declaring C functions • you rely on the user to translate the prototypes • option 3: parse C headers (hard] (define fun (foreign-lambda c-string "fun" c-string int)) Implementing an Interpreter in C++ 107/109 May 17, 2018 Assignment 6 • implement a simple C-based FFI for your interpreter • must be able to call functions w/ up to 4 args — only consider integer/pointer arguments • it must be able to deal with read or similar • use the FFI to get I/0 capabilities Deadline 31st of May. Implementing an Interpreter in C++ 108/109 May 17, 2018 Final Project • implement an interactive game of tic-tac-toe • there is no deadline Implementing an Interpreter in C++ 109/109 May 17, 2018