Micro-architectural Attacks 1 Milan Patnaik Indian Institute of Technology Madras Credits: Prof Chester Rebeiro and my colleages of RISE Lab, IIT Madras Agenda : Class • Cache Timing Attacks • Cache Covert Channels. • Flush+Reload Attacks. • Cache Collision Attacks • Prime+Probe Attacks. • Time Driven Attacks. • Case Studies • Meltdown • Spectre • Rowhammer Agenda : Labs • Lab1. • Cache Covert Channel. • Lab2. • Cache Timing Attack. Things we thought gave us security! • Cryptography • Passwords • Information Flow Policies • Privileged Rings • ASLR • Virtual Machines and confinement • Javascript and HTML5 (due to restricted access to system resouces) • Enclaves (SGX and Trustzone) Micro-Architectural Attacks (can break all of this) Cache timing attackCache timing attack Branch prediction attackBranch prediction attack Speculation AttacksSpeculation Attacks Row hammerRow hammer Fault Injection AttacksFault Injection Attacks ….. and many more….. and many more cold boot attackscold boot attacks • Cryptography • Passwords • Information Flow Policies • Privileged Rings • ASLR • Virtual Machines and confinement • Javascript and HTML5 (due to restricted access to system resouces) • Enclaves (SGX and Trustzone) DRAM Row buffer (DRAMA)DRAM Row buffer (DRAMA) Causes performance security Most micro-architectural attacks caused by performance optimizations Others due to inherent device properties Third, due to stronger attackers Cache Timing Attacks Cache Covert Channels Cache Timing Attacks Flush + Reload Attack Copy on Write if (fork() > 0){ // in parent process } else{ // in child process } 2 • Making a copy of a process is called forking. – Parent (is the original) – child (is the new process) • When fork is invoked, – child is an exact copy of parent • When fork is called all pages are shared between parent and child • Easily done by copying the parent s page tables Physical Memory Parent Page Table Child Page Table Virtual Addressing Advantage (easy to make copies of a process) Child created is an exact replica of the parent process. - Page tables of the parent duplicated in the child - New pages created only when parent (or child) modifies data - Postpone copying of pages as much as possible, thus optimizing performance - Thus, common code sections (like libraries) would be shared across processes. Process Tree : SSLEncryption() : init : SSLEncryption() : Physical Memory Virtual Memory (process 1) Virtual Memory (process 2) Interaction with the LLC ProcessesProcesses Core 1Core 1 LLCLLC : SSLEncryption() : cache misses slow Core 2Core 2 ProcessesProcesses Interaction with the LLC : SSLEncryption() : cache hits : SSLEncryption() : fast One process can affect the execution time of another process ProcessesProcesses Core 1Core 1 LLCLLC Core 2Core 2 ProcessesProcesses Flush + Reload Attack on LLC Part of an encryption algorithm executed only when ei = 1 clflush Instruction Takes an address as input. Flushes that address from all caches clflush (line 8) Flush+Reload Attack, Yuval Yarom and Katrina Falkner (https://eprint.iacr.org/2013/448.pdf) Flush + Reload Attack : SSLEncryption() : : Clflush(line 8) : flush reload access victim attacker ProcessesProcesses Core 1Core 1 LLCLLC Core 2Core 2 ProcessesProcesses Flush+Reload Attack Countermeasures • Do not use copy-on-write – Implemented by cloud providers • Permission checks for clflush – Do we need clflush? • Non-inclusive cache memories – AMD – Intel i9 versions • Fuzzing Clocks • Software Diversification – Permute location of objects in memory (statically and dynamically) Cache Collision Attacks Prime + Probe Attack Prime + Probe Attack Core 1Core 1 Last Level CacheLast Level Cache Core 2Core 2 VictimVictim SMT Core SMT Core L1 Cache MemoryL1 Cache Memory SpySpy VictimVictim SpySpy way 0 way 1 way 2 way 3 Set 0 Set 1 Set 2 Set 3 Set N-2 Set N-1 Prime Phase way 0 way 1 way 2 way 3 Set 0 Set 1 Set 2 Set 3 While(1){ for(each cache set){ start = time(); access all cache ways end = time(); access_time = end – start } wait for some time } Victim Execution way 0 way 1 way 2 way 3 Set 0 Set 1 Set 2 Set 3 The execution causes some of the spy data to get evicted Probe Phase way 0 way 1 way 2 way 3 Set 0 Set 1 Set 2 Set 3 While(1){ for(each cache set){ start = time(); access all cache ways end = time(); access_time = end – start } wait for some time } Time taken by sets that have victim data is more due to the cache misses Probe Time Plot0 63 Each row is an iteration of the while loop; darker shades imply higher memory access time Prime + Probe in Cryptography char Lookup[] = {x, x, x, . . . x}; char RecvDecrypt(socket){ char key = 0x12; char pt, ct; read(socket, &ct, 1); pt = Lookup[key ^ ct]; return pt; } The attacker know the address of Lookup and the ciphertext (ct) The memory accessed in Lookup depends on the value of key Given the set number, one can identify bits of key ^ ct. Key dependent memory accesses Keystroke Sniffing • Keystroke  interrupt  kernel mode switch  ISR execution  add to keyboard buffer  …  return from interrupt way 0 way 1 way 2 way 3 Set 0 Set 1 Set 2 Set 3 Keystroke Sniffing • Regular disturbance seen in Probe Time Plot • Period between disturbance used to predict passwords Svetlana Pinet, Johannes C. Ziegler, and F.-Xavier Alario. 2016. Typing Is Writing: Linguistic Properties Modulate Typing Execution. Psychon Bull Rev 23, 6 Web Browser Attacks • Prime+Probe in – Javascript – pNACL – Web assembly Extract Gmail secret key https://www.cs.tau.ac.il/~tromer/drivebycache/drivebycache.pdf Website Fingerprinting • Privacy: Find out what websites are being browsed. Cross VM Attacks (Cache) Cross VM Attacks (DRAM) Cache Collision Attacks Time Driven Attacks Internal Collision Attacks (Adversary) Victim Internal Collisions on a Cipher Table Table Part of a Cipher P0 ,P4 (Adversary) If cache hit (less time) : If cache miss (more time): 00 KP  44 KP  4P0P 0K 4K 4040 4400 PPKK KPKP   4040 4400 PPKK KPKP   T P0 K0 T P4 K4 Block Cipher Random P0 Cipher Text P4 Suppose (K0 = 00 and k4 = 50) • P0 = 0, all other inputs are random • Make N time measurements • Segregate into Y buckets based on value of P4 • Find average time of each bucket • Find deviation of each average from overall average (DOM) P4 Average Time DOM 00 2945.3 1.8 10 2944.4 0.9 20 2943.7 0.2 30 2943.7 0.2 40 2944.8 1.3 50 2937.4 -6.3 60 2943.3 -0.2 70 2945.8 2.3 : : : F0 2941.8 -1.7 Average : 2943.57 Maximum : -6.34040 PPKK  That’s for the Day !!