Binary Exploitation 2a Return Oriented Programming & ASLR Milan Patnaik Indian Institute of Technology Madras Where Are We ? 2 Where Are We ? • Executable, No Canaries and No ASLR. • Overwrite return address. • Shellcode in stack. 3 Where Are We ? • Executable, No Canaries and No ASLR. • Overwrite return address. • Shellcode in stack. • Non Executable, No Canaries and No ASLR. • Overwrite the return address. • Return to libc restricted by system(). 4 Where Are We ? • Executable, No Canaries and No ASLR. • Overwrite return address. • Shellcode in stack. • Non Executable, No Canaries and No ASLR. • Overwrite the return address. • Return to libc restricted by system(). • Non Executable, No Canaries and No ASLR. • Overwrite return address. • Return Oriented Programming. • Execute arbitrary code. 5 Return Oriented Programming (ROP) 6 Return Oriented Programming 7 Return Oriented Programming Attacks • Discovered by Hovav Shacham of Stanford University • Subverts execution. – As with the regular ret-2-libc, can be used with non executable stacks since the instructions can be legally executed. – Unlike ret-2-libc does not require to execute functions in libc (can execute any arbitrary code). 8 The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls on the x86 Stack : Function Call EBP Parameters for function return Address Locals of function prev frame pointer push $3 push $2 push $1 Stack push %ebp movl %esp, %ebp sub $20, %esp %ebp : Frame Pointer In main In function ESP ESP ESP ESP ESP ESP %esp : Stack Pointer 9 Call instruction has 2 steps:  Push the contents pointed to by EIP.  Decrease ESP by 4 (32bit machine) Stack : Function Return EBP Parameters for function return Address Locals of function prev frame pointer Stack %ebp : Frame Pointer ESP ESP ESP ESP ESP ESP %esp : Stack Pointer 10 Ret instruction has 2 steps:  Pops the contents pointed to by ESP into EIP  Increment ESP by 4 (32bit machine) movl %ebp, %esp In function push $3 push $2 push $1 In main Action by leave instruction Target Payload Lets say this is the payload needed to be executed by an attacker. Suppose there is a function in libc, which has exactly this sequence of instructions … then we are done.. we just need to subvert execution to the function 11 Target Payload Lets say this is the payload needed to be executed by an attacker. Suppose there is a function in libc, which has exactly this sequence of instructions … then we are done.. we just need to subvert execution to the function What if such a function does not exist? If you can’t find it then build it 12 Step 1: Find Gadgets • Find gadgets. • A gadget is a short sequence of instructions followed by a return. • Useful instructions : should not transfer control outside the gadget. • This is a pre-processing step by statically analysing the libc library. useful instruction(s) ret 13 Step 2: Stitching • Stitch gadgets so that the payload is built Program Binary movl %esi, 0x8(%esi) ret G1 movb $0x0, 0x7(%esi) ret G2 movb $0x0, 0xc(%esi) ret G3 movl $0xb, %eax ret G4 14 Step 3: Construct the Stack 15 xxx xxx xxx AG1 AG2 AG3 AG4 xxx buffer xxx Return Address Program Binary movl %esi, 0x8(%esi) ret G1 movb $0x0, 0x7(%esi) ret G2 movb $0x0, 0xc(%esi) ret G3 movl $0xb, %eax ret G4 Program Stack AGi: Address of Finding Gadgets • Static analysis of libc • To find 1. A set of instructions that end in a ret (0xc3). The instructions can be intended (put in by the compiler) or unintended. 2. Besides ret, none of the instructions transfer control out of the gadget. 16 Intended vs Unintended Instructions • Intended : machine code intentionally put in by the compiler • Unintended : interpret machine code differently in order to build new instructions 17 F7 C7 07 00 00 00 0F 95 45 C3Machine Code : What the compiler intended.. What was not intended Highly likely to find many diverse instructions of this form in x86. Not so likely to have such diverse instructions in RISC processors. Geometry • Given an arbitrary string of machine code, what is the probability that the code can be interpreted as useful instructions. – x86 code is highly dense. – RISC processors like (SPARC, ARM, etc.) have low geometry. • Thus finding gadgets in x86 code is considerably more easier than that of ARM or SPARC. • Fixed length instruction set reduces geometry. 18 Finding Gadgets • Static analysis of libc. • Find any memory location with 0xc3 (RET instruction). • Build a trie data structure with 0xc3 as a root. • Every path from leaf to the root is a possible gadget. 19 C 3 C 3 0 0 0 0 2 4 2 4 3 7 3 7 2 4 2 4 4 6 4 6 4 3 4 3 1 6 1 6 8 9 8 9 9 4 9 4 child of Finding Gadgets • Scan libc from the beginning toward the end • If 0xc3 is found – Start scanning backward – With each byte, ask the question if the subsequence forms a valid instruction – If yes, add as child – If no, go backwards until we reach the maximum instruction length (20 bytes) – Repeat this till (a predefined) length W, which is the max instructions in the gadget 20 33 b2 23 12 a0 31 a5 67 22 ab ba 4a 3c c3 ff ee ab 31 11 09 Gadgets : Constant into Register Loading a constant into a register (edx = deadbeef) 21 deadbeefdeadbeef GadgetAddGadgetAdd stack pop %edx ret pop %edx ret esp • A previous return will pop the gadget address into %eip • %esp will also be incremented to point to deadbeef (4 bytes on 32 bit platform) • The pop %edx will pop deadbeef from the stack and increment %esp to point to the next 4 bytes on the stack Gadgets : Arbitrary Data into eax 22 pop %edx ret pop %edx ret G1 mov 64(%edx), %eax ret mov 64(%edx), %eax ret G2 G2G2 addraddr G1G1 stack esp deadbeefdeadbeef +64 Load arbitrary data into %eax register using Gadgets G1 and G2 Gadgets: Store Constants • Store the contents of a register to a memory location in the stack 23 GadgetAddr 2GadgetAddr 2 00 GadgetAddr 1GadgetAddr 1 stack pop %edx ret pop %edx ret esp mov %eax, 24(%edx) ret mov %eax, 24(%edx) ret 24 Gadget: Addition 24 addl (%edx), %eax push %edi ret addl (%edx), %eax push %edi ret Add the memory pointed to by %edx to %eax. The result is stored in %eax pushes %edi.. onto the stack why is this present? …. This is unnecessary, but this is best gadget that we can find for addition But can create problems!! We need work arounds! GadgetAddr2GadgetAddr2 GadgetAddrGadgetAddr stack esp Some gadgetSome gadget Gadget: Addition 25 addl (%edx), %eax push %edi ret addl (%edx), %eax push %edi ret Add the memory pointed to by %edx to %eax. The result is stored in %eax pushes %edi.. onto the stack why is this present? …. This is unnecessary, but this is best gadget that we can find for addition But can create problems!! We need work arounds! GadgetAddrGadgetAddr stack esp ModifiedModified Some gadgetSome gadget Gadgets: Addition with NOP 26 addl (%edx), %eax push %edi ret addl (%edx), %eax push %edi ret 1. First put gadget ptr for 0xC3 into %edi 2. 0xC3 corresponds to NOP in ROP 3. push %edi in gadget 2 just pushes 0xc3 back into the stack Therefore not disturbing the stack contents 4. Gadget 3 executes as planned GadgetAddr3GadgetAddr3 Gadget_RETGadget_RET GadgetAddr2GadgetAddr2 Gadget_RETGadget_RET GadgetAddr1GadgetAddr1 stack esp 0xc30xc3 0xc3 is ret in ROP and ret is equivalent to NOP instruction pop %edi ret pop %edi ret Unconditional Branches • Changing the %esp 27 GAGA stack esp pop %esp ret Conditional Branches 28 In x86 instructions conditional branches have 2 parts. 1. An instruction which modifies a condition flag (eg CF, OF, ZF). eg. CMP %eax, %ebx (will set ZF if %eax = %ebx) 2. A branch instruction (eg. JZ, JCC, JNZ, etc). which internally checks the conditional flag and changes the EIP accordingly. In ROP conditional branches have 3 parts. 1. An ROP which modifies a condition flag (eg CF, OF, ZF). eg. CMP %eax, %ebx (will set ZF if %eax = %ebx) 2. Transfer flags to a register or memory. 3. Perturb %esp based on flags stored in memory. In ROP, we need flags to modify %esp register instead of EIP Needs to be explicitly handled Step 1 : Set the flags Find suitable ROPs that set appropriate flags 29 CMP %eax, %ebx RET subtraction Affects flags CF, OF, SF, ZF, AF, PF NEG %eax RET 2s complement negation Affects flags CF Step 2: Transfer flags to memory or register • Using lahf instruction stores 5 flags (ZF, SF, AF, PF, CF) in the %ah register • Using pushf instruction pushes the eflags into the stack ROPs for these two not easily found. A third way – perform an operation whose result depends on the flag contents. 30 where would one use this instruction? Step 2: Indirect way to transfer flags to memory Several instructions operate using the contents of the flags 31 ADC %eax, %ebx : add with carry that performs eax <- eax + ebx + CF. (if eax and ebx are 0 initially, then the result will be either 1 or 0 depending on the CF) RCL : rotate left with carry. RCL %eax, 1 (if eax = 0. then the result is either 0 or 1 depending on CF) Gadgets: Transfer Flags to Memory 32 %edx will have value A %ecx will contain 0x0 A Step 3: Perturb %esp depending on flag 33 If (CF is set){ perturb %esp }else{ leave %esp as it is } What we hope to achieve * CF stored in a memory location (say X). * Current %esp. * Delta, how much to perturb %esp. What we have negate X offset = Delta & X %esp = %esp+offset One way of achieving … 1. Negate X (eg. Using instruction negl) finds the 2’s complement of X if (X = 1) 2’s complement is 111111111… if (X = 0) 2’s complement is 000000000... 2. offset = Delta if X = 1 offset = 0 if X = 0 3. %esp = %esp + offset if X = 1 %esp = %esp if X = 0 Turing Complete • Gadgets can do much more… invoke libc functions, invoke system calls, ... • For x86, gadgets are said to be turning complete. – Can program just about anything with gadgets. • For RISC processors, more difficult to find gadgets. – Instructions are fixed width. – Therefore can’t find unintentional instructions. • Tools available to find gadgets automatically. Eg. ROPGadget (https ://github.com/JonathanSalwan/ROPgadget) Ropper (https://github.com/sashs/Ropper) 34 Address Space Layout Randomization (ASLR) 35 Address Space Randomization • Address space layout randomization (ASLR) randomizes the address space layout of the process. • Each execution would have a different memory map, thus making it difficult for the attacker to run exploits. • Initiated by Linux PaX project in 2001. • Now a default in many operating systems. 36 Memory layout across boots for a Windows box Non Executable Stack Attack Prevention ASLR in the Linux Kernel • Locations of the base, libraries, heap, and stack can be randomized in a process’ address space. • Built into the Linux kernel and controlled by /proc/sys/kernel/randomize_va_space • randomize_va_space can take 3 values: 0 : disable ASLR. 1 : positions of stack, VDSO, shared memory regions are randomized the data segment is immediately after the executable code. 2 : (default setting) setting 1 as well as the data segment location is randomized. 37 Non Executable Stack Attack Prevention ASLR in Action 38 First Run Another Run Non Executable Stack Attack prevention ASLR in the Linux Kernel • Permanent changes can be made by editing the /etc/sysctl.conf file. Two requirements:- Make the code relocatable. - Generate random address. 39 /etc/sysctl.conf, for example: kernel.randomize_va_space = value sysctl -p Non Executable Stack Attack Prevention Internals : Making code relocatable • Load time relocatable. – where the loader modifies a program executable so that all addresses are adjusted properly. – Relocatable code. • Slow load time since executable code needs to be modified. • Requires a writeable code segment, which could pose problems. • PIE : position independent executable. – a.k.a PIC (position independent code). – code that executes properly irrespective of its absolute address. – Used extensively in shared libraries. • Easy to find a location where to load them without overlapping with other modules. 40 Non Executable Stack Attack Prevention Load Time Relocatable 41 11 Non Executable Stack Attack Prevention Load Time Relocatable 42 note the 0x0 here… the actual address of mylib_int is not filled in 22 Non Executable Stack Attack Prevention Load Time Relocatable 43 Relocatable table present in the executable that contains all references of mylib_int33 Offset in memory where the fix needs to be made Store binary value in the symbol memory location Non Executable Stack Attack Prevention Load Time Relocatable 44 The loader fills in the actual address of mylib_in at run time. 44 Non Executable Stack Attack Prevention Load Time Relocatable 45 Limitations  Slow load time since executable code needs to be modified.  Requires a writeable code segment, which could pose problems.  Since executable code of each program needs to be customized, it would prevent sharing of code sections. Non Executable Stack Attack Prevention Position Independent Executable • An additional level of indirection for all global data and function references. • Uses a lot of relative addressing schemes and a global offset table (GOT). • For relative addressing, – data loads and stores should not be at absolute addresses but must be relative. 46 http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in- shared-libraries/ Non Executable Stack Attack Prevention Global Offset Table (GOT) • Table at a fixed (known) location in memory space and known to the linker. • Has the location of the absolute address of variables and functions. 47 Without GOT With GOT Non Executable Stack Attack Prevention Enforcing Relative Addressing (example) 48 With load time relocatable With PIC Non Executable Stack Attack Prevention Enforcing Relative Addressing (example) 49 With load time relocatable With PIC Get address of next instruction to achieve relativeness Index into GOT and get the actual address of mylib_int into eax Now work with the actual address. Non Executable Stack Attack Prevention Advantage of the GOT • With load time relocatable code, every variable reference would need to be changed. – Requires writeable code segments. – Huge overheads during load time. – Code pages cannot be shared. • With GOT, the GOT table needs to be constructed just once during the execution. – GOT is in the data segment, which is writeable. – Data pages are not shared anyway. – Drawback : runtime overheads due to multiple loads. 50 Non Executable Stack Attack Prevention An Example of working with GOT 51 $gcc –m32 –shared –fpic –S got.c Besides a.out, this compilation also generates got.s The assembly code for the program. 52 Data section Text section Fills %ecx with the eip of the next instruction. Why do we need this indirect way of doing this? In this case what will %ecx contain? The macro for the GOT is known by the linker. %ecx will now contain the offset to GOT Load the absolute address of myglob from the GOT into %eax Non Executable Stack Attack Prevention More 53 offset of myglob in GOT GOT it! Non Executable Stack Attack Prevention Internals: Randomizing the data section 54 loading the executable Check if randomize_va_space is > 1 (it can be 1 or 2) Compute the end of the data segment (m->brk + 0x20) Finally Randomize Non Executable Stack Attack Prevention Deep Within the Kernel (randomizing the data section) 55 Non Executable Stack Attack Prevention Deep Within the Kernel (randomizing the data section) 56 • The address of the first element of the ‘hash[0]’ array. • The currently executing process ID for the processor that handles this. • The system’s jiffies value. • CPU cycles number. Non Executable Stack Attack Prevention Function Calls in PIC • Theoretically could be done similar with the data. – call instruction gets location from GOT entry that is filled in during load time (this process is called binding). – In practice, this is time consuming. Much more functions than global variables. Most functions in libraries are unused. • Lazy binding scheme. – Delay binding till invocation of the function. – Uses a double indirection – PLT – procedure linkage table in addition to GOT. 57 Non Executable Stack Attack Prevention The PLT 58 11 • Instead of directly calling func, invoke an offset in the PLT instead. • PLT is part of the executable text section, and consists of one entry for each external function the shared library calls. • Each PLT entry has  A jump location to a specific GOT entry • Preparation of arguments for a ‘resolver’ • Call to resolver function Non Executable Stack Attack Prevention First Invocation of Func First Invocation of func 59 11 22 (steps 2 and 3) On first invocation of func, PLT[n] jumps to GOT[n], which simply jumps back to PLT[n]. 33 Non Executable Stack Attack Prevention First Invocation of Func 60 11 22 (step 4) Invoke resolver, which resolves the actual of func, places this actual address into GOT and then invokes func. The arguments passed to resolver, that helps to do symbol resolution. Note that the contents of GOT is now changed to point to the actual address of func. 33 44 Non Executable Stack Attack Prevention Example of PLT 61 Compiler converts the call to set_mylib_int into set_mylib_int@plt Non Executable Stack Attack Prevention Example of PLT 62 ebx points to the GOT table ebx + 0x10 is the offset corresponding to set_mylib_int Offset of set_mylib_int in the GOT (+0x10). It contains the address of the next instruction (ie. 0x3c2) Non Executable Stack Attack Prevention Example of PLT 63 Push arguments for the resolver. Jump to the first entry of the PLT Ie. PLT[0] Jump to the resolver, which resolves the actual address of set_mylib_int and fills it into the GOT Non Executable Stack Attack Prevention Subsequent invocations of Func 64 11 22 33 Non Executable Stack Attack Prevention Advantages • Functions are relocatable, therefore good for ASLR. • Functions resolved only on need, therefore saves time during the load phase. 65 Non Executable Stack Attack Prevention Bypassing ASLR • Brute force. • Return-to-PLT. • Overwriting the GOT. • Timing Attacks. 66 Bypassing ASLR • Brute force. 67 Bypassing ASLR • Brute force. 68 That’s for the classes