: title : Binary Analysis and Disassembly : authors : Petr Ročkai and Lukáš Korenčik : doctype : slides Organisation • ‹theory:› 30-50 minutes every week • remaining time: ‹coding› and ‹discussions› • there will be 6 bi-weekly assignments ◦ together, they will form a project ◦ each assignment is a milestone Grading • you need 7 points to pass the subject • each ‹assignment› is worth ‹1 point› • ‹showing up› 10 times is worth ‹1 point› • up to ‹2 points› for writing ‹code reviews› • up to ‹2 points› for meeting ‹deadlines› Deadlines and Feedback • the deadline for each assignment is 14 days • beating the deadline gives you 1/3 of a point ◦ the solution must be of sufficient quality • feedback will be given on the off weeks ◦ i.e. 7 days after the deadline Programming Language • C or C++ is up to you • you can use up to C11 and up to C++17 • only the standard library and POSIX • no boost, no 「libelf」 or 「BFD」 Repositories • make a repository for your homeworks ◦ 「git」, 「hg」 or whatever works for you ◦ make it public and email me the URL • write a simple 「Makefile」 (inc. dependencies) ◦ you will only have a few source files ◦ 「cmake」 is acceptable but discouraged Assignment Submission • tag your repository with 「hw1」, etc. ◦ use 「hw1.1」 etc. for resubmissions • tag dates are what counts for deadlines • we will ‹not› look at master head ◦ you can break stuff freely there Semester Plan (part 1) │ │ date │ ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┼┄┄┄┄┄┄▶┤ │ 1. introduction & preliminaries │ 19.2. │ │ 2. instruction sets │ 26.2. │ │ 3. static control flow │ 5.3. │ │ 4. dynamic control flow │ 12.3. │ │ 5. executable files, ELF │ 19.3. │ │ 6. dynamic linking │ 26.3. │ Semester Plan (part 2) │ │ date │ ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┼┄┄┄┄┄┄▶┤ │ 7. debug info │ 2.4. │ │ 8. DWARF │ 9.4. │ │ 9. function calls and frames │ 16.4. │ │ 10. advanced instructions │ 23.4. │ │ 11. debugger basics │ 30.4. │ │ 12. decompilation basics │ 7.5. │ Assignment Schedule │ │ given │ due │ ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┼┄┄┄┄┄┄▶┤┄┄┄┄┄┄▶┤ │ 1. decoding instructions │ 26.2. │ 12.3. │ │ 2. basic blocks & branching │ 12.3. │ 26.3. │ │ 3. making sense of ELF │ 26.3. │ 9.4. │ │ 4. parsing symbol tables │ 9.4. │ 23.4. │ │ 5. reconstructing functions │ 23.4. │ 7.5. │ │ 6. a complete disassembler │ 7.5. │ 21.5. │ # Preliminaries Machine Code • consists of individual ‹instructions› • ‹encoded› in a tightly-packed binary format ◦ may be fixed or variable length • stored program architecture ◦ instructions live in addressable memory Assembly • symbolic language one level above machine code ◦ abstracts away from numeric addresses ◦ replaces them with symbolic ‹labels› • instructions are encoded in a text format • designed for humans (but rarely used nowadays) C and Compilers • another layer of abstraction over assembly • abstracts away the specifics of hardware architecture ◦ registers, stack management, opcodes ◦ provides structured control flow • still a low-level language, mostly OS-level programs Compiled High-Level Languages • another abstraction rung above C ◦ algebraic or class-based type systems ◦ abstract data structures ◦ extensive standard libraries ◦ late dispatch, lexical closures, ... • e.g. C++, Rust, ML, Haskell, (Java) Interpreters • typically the highest rung of the abstraction tower ◦ dynamic types, garbage collectors ◦ powerful, high-level libraries or APIs • often realized as JIT compilers / virtual machines ◦ usually implemented in C or C++ • e.g. JavaScript, Python, Ruby, 「bash」, R Disassembly • going from machine code to assembly ◦ decode instruction ◦ recover control flow structure • print the program in human-readable format • re-assembling should give identical machine code Decompilation • attempt to reconstruct high-level code ◦ recovery of structured control flow (「if」, 「while」) ◦ identification of local variables ◦ recovery of addresses • decompile → compile is not idempotent Exercise 1.1: 「objdump」 • read the 「objdump」 manpage • try 「objdump -x」 on some binaries ◦ 「/usr/bin/gzip」 ◦ your own test program (hello world style) ◦ try 「-static」, 「-fPIC」 &c. ◦ try both the 「.o」 file and the executable • also try 「objdump -x --disassemble」 Exercise 1.2: 「gdb」 • compile your test program with 「-g」 • 「gdb [--args] ./a.out」 • 「start」 • 「stepi」, 「disassemble」, 「print $rax」 • 「break」 • for more user friendliness: 「layout」 Exercise 1.3: reading binary data $ printf "\x03\x12\x01\x00\x00\x00" > file.bin # shell read the above data into the following structure struct __attribute__((packed)) d /* C++ */ { short x; int y; } expected result: 「x = 4611, y = 1」 # Instruction Sets Instruction Anatomy • opcode: ‹what› to do • operands ◦ ‹immediate› values (part of instruction) ◦ ‹register› references ◦ ‹memory› references (immediate or via register) • modifiers (e.g. lock) Opcode Classes • control flow • integer arithmetic • bit operations • memory access • floating point arithmetic • special instructions Register Classes • GPR: General Purpose Register ◦ hold a single word: integers, addresses • SIMD (vector) and/or floating point registers • pointers: stack, instruction, frame (base) • machine control registers Control Flow • conditional & unconditional ‹jumps› ◦ ‹direct› (fixed address) ◦ ‹indirect› unconditional (computed address) ◦ ‹conditional› on results of arithmetic • subroutine ‹calls› and ‹returns› ◦ use the ‹stack› for return addresses Arithmetic • addition, subtraction • signed+unsigned division, multiplication • integer comparison (signed/unsigned) • standard instructions up to word size (64b) ◦ 128b operations are available too Bit Operations • bitwise and, or, xor, negate • shifts and rotations • bit field packing/unpacking • bit counting, endianity conversion Memory Access • ‹load› from and ‹store› into ‹memory› • various ‹address computation› modes ◦ part of the access instruction ◦ special-purpose arithmetic (「lea」) ◦ general-purpose arithmetic Addressing Modes • ‹scalars›: base register + offset ◦ especially useful for stack variables ◦ also globals (relative to program counter) • ‹arrays›: base register + immediate * index register • ‘far’ addressing for segmented memory (obsolete) Floating Point Arithmetic • separate ‹instruction› set • separate ‹registers› (distinct from GPRs) • variable precision (usually 32b, 64b, 80b) • governed by IEEE 754 Specials: Synchronisation • ‹atomic› memory access ◦ read-modify-write (add, sub, xor, ...) ◦ compare + exchange • memory ‹fences› / barriers • on amd64 encoded using the 「lock」 prefix Specials: Vector Instructions • SIMD: single instruction (opcode), multiple data • integer and floating-point ‹arithmetic› • 4-8 values packed in 128b or 256b register • speeds up ‹number crunching› considerably • ‹on top› of usual ‹superscalar› execution Specials: User Mode • checksums (e.g. 「crc32」) • symmetric crypto (「aes-ni」) • random numbers (「rdrand」, 「rdseed」) • processor capabilities (「cpuid」) • timers (「rdtsc」) Specials: Privileged Mode • CPU management opcodes and registers • interrupt handling • system calls • cache control • debugging and monitoring • virtualisation Exercise 2.1 • learn a bit more about assembly • use 「gcc -S」 to produce ‹examples› ◦ you can also try 「-fverbose-asm」 • write a recursive 「factorial」 (in C) ◦ use 「gdb」 ‹instruction› stepping ◦ try an ‹iterative› version too Exercise 2.2 • ‹write› a simple ‹assembly› program • borrow the prologue/epilogue from 「gcc」 • sum an arithmetic/geometric sequence ◦ use formulas first (just arithmetic) ◦ try using a summing loop Instruction Encoding • how to encode opcodes and operands into bytes • fixed-length or variable-length • fixed: e.g. VLIW (very long instruction word) ◦ often employs instruction ‹combining› ◦ variant: fixed opcodes, trailing immediate operands Variable-Length Coding • ‹saves space› compared to fixed-width coding • often much ‹harder to decode› • usually decoded from left to right • first byte affects what second byte means, &c. • already-decoded prefix tells you whether to continue Encoding on AMD64 • programmer's manual in study materials • ‹variable› length (even opcodes) • very long history of extensions • different meaning in different CPU modes • not a very clean encoding of a messy instruction set Assignment 1 • write an instruction decoder for amd64 • have 「make decode」 build the binary • invocation 「./decode 74 1a」 ◦ prints: 「je 0x1a(%rip)」 • we will only decode a small subset of instructions • print 「unknown instruction」 if that is the case Assignment 1: Required • branching: 「jmp」, 「je」, 「jne」, 「jb」 ◦ operands: rel8off, rel32off • stack: 「push」, 「pop」 (64b only) • calls: near 「call」 (rel32off) and 「ret」 • 「mov」 in 64b mem/reg versions (details later) • a few arithmetic and bitwise ops, 「nop」, 「int3」 Assignment 1: Arithmetic & Bitwise • 「xor eax」 imm32 and 「xor rax」 imm32 • 「add eax」 imm32 and 「add rax」 imm32 • 「mul」 with 2 64b registers (「rax」 -- 「rdx」) • 「cmp eax」 imm32 and 「cmp rax」 imm32 • 「cmp」 with 2 64b registers (「rax」 -- 「rdx」) Assignment 1: 「mov」 • only the 「89」 and 「8B」 opcodes ◦ with 2 64b registers (「rax」 -- 「rdx」) ◦ from memory to a 64b register ◦ from a 64b register to memory • memory: address in 「rax」 or 「rbx」 ◦ 「rip」 and 「rbp」 + 32b displacement Assignment 1: Not Required • anything in the VEX maps • memory operands other than ◦ 「mov」 with address in 「rax」 or 「rbx」 ◦ 「mov」 with 「rip」 and 「rbp」 + disp32 • prefixes other than the REX range Assignment 1: Hints • most 64b instructions need a REX prefix (「0x40-0x4F」) • exceptions: 「call」, 「ret」, 「jmp」, branching ◦ some of the 「push」/「pop」 (those of ‘named’ GPRs) • look for complete decoded examples in 「objdump」 # Static Control Flow Control Flow • answers the question ‘what to do next?’ • normally, instructions run in a sequence • just like statements in C • how about conditionals and loops? Structured Control Flow • used in high-level languages • 「if」 statements or expressions • structured loops: 「while」, 「for」 • not available in machine code Goto • also known as unstructured control flow • 「goto」 jumps from one place to another ◦ the destination is called a ‹label› ◦ the jump is unconditional (always taken) • 「if」 + 「goto」 → any loop Goto: Example int f( int x ) /* C++ */ { int i = x; loop: x = x * --i; if ( i > 1 ) goto loop; return ~x; } Machine Code • 「goto」 is basically a ‹jump› instruction • there are no labels in machine code • assembler computes label offsets • there is also a ‹conditional jump› instruction ◦ perform the 「goto」 only if a condition holds Simplified 「if」 • in C, 「if」 can guard arbitrary statements • what if it could only guard exactly 1 「goto」? ◦ and there is no 「else」 either • we can still do everything if ( x ) { foo(); bar(); } /* C++ */ else baz(); Reinventing 「if」 if_begin: /* C++ */ if ( !x ) goto if_false; foo(); bar(); goto if_end; if_false: baz(); if_end: Conditional Jump • recall 「if ( x > 0 ) goto loop」 • this is basically 2 instructions • first is 「cmp」, the second is 「jg」 • conditional 「goto」 is conditional jump • used to encode all control flow in machine code Basic Blocks • abstraction used by compilers • starts with a label • followed by a sequence of non-jump instructions ◦ no labels or jumps in the sequence • with a single jump/branch at the end Control Flow Graph • take instructions as nodes • control flow as edges • extremely useful for code analysis • using basic blocks makes the graph much smaller Exercise 3.1 • rewrite this program with ‹conditional gotos› while ( x < 1000 ) /* C++ */ { x *= 5; if ( x % 7 == 0 ) break; x --; } int fib( int n ) /* exercise 3.2 */ /* C++ */ { if ( n <= 2 ) return 1; else { int a = fib( n - 1 ); int b = fib( n - 2 ); return a + b; } } Exercise 3.3 • take the ‹goto› version of program from 3.2 • change it to only have one 「return」 statement • draw the control flow graph of both versions Exercise 3.4 • write an iterative version of 「fib」 • you can use the argument + 3 variables • change it into ‹goto› form • draw the control flow graph Exercise 3.5 • compile all above programs into object files • disassemble them using 「objdump」 • recover control flow from the assembly ◦ only add labels that are required ◦ identify basic blocks Exercise 3.6 • rewrite program from 3.4 into assembly by hand • only use registers for computation • start from an empty 「int fib( int )」 skeleton • check that the program does the right thing # Dynamic Control Flow Last Time • direct conditional + unconditional jumps • basic blocks, control flow graph Today • direct & indirect function calls, returns • indirect jumps and jump tables Function Calls • 「call」 is usually static (fixed address) • but 「ret」 jumps to a dynamic address ◦ also known as ‹return address› • arguments are passed in registers or on stack • local variables are stored on the stack Call Frame • each function uses up a section of the ‹stack› ◦ known as a ‹frame›, holds automatic ‹local variables› ◦ though some of those might only live in registers • there's also stuff in-between frames ◦ arguments, register spills, return address Indirect Jump • jump to a dynamic address (i.e. not constant) • often arises from 「switch」 statements (in C) • either computed or via a jump table • looks like 「jmp *%rax」 (if the address is in 「rax」) Ex 4.1 • write a simple C function with a 「switch」 • use consecutive integer cases (i.e. 1, 2, 3, ...) • put different code in each branch (e.g. 「return N」) • compile with 「gcc」 and 「clang」 with different 「-O」 ◦ compare the assembly output Detour: Graphviz • a simple but powerful tool to draw graphs • accepts plain-text input that looks like this digraph G { 1 [ shape=rectangle label="box" ] 2 [ shape=rectangle label="another\lbox\l" ] 1 -> 2 [ label="arrow" ] } Ex 4.2 • draw the CFG from 3.3 or 3.4 using 「dot」 • see 「https://graphviz.org」 for docs • to look at the result, use 「dot -Tx11 < cfg.dot」 ◦ 「dot -Tpdf > cfg.pdf」 also works • put instructions into the boxes Assignment 2 • extend your decoder to allow multiple instructions • print each instruction on a separate line • assume the code starts at address 0 • decompose the code into basic blocks • use graphviz 「dot」 to generate a CFG Assignment 2: Input • continue to allow ascii/hex bytes in 「argv[]」 • if no args given, read a ‹raw binary› from 「stdin」 • you can assume there are only ‹known instructions› • and the input will be at most 2KB (for now) Assignment 2: Output • generate ‘maximal’ basic blocks ◦ print 「#