# Microarchitectural Attack

#### Cache Based Attacks

Dr Milan Patnaik Indian Institute of Technology Madras, India Rashtriya Raksha University, India



### <u>Outline</u>

- Cache Timing Attacks.
  - Cache Covert Channel.
  - Flush + Reload Attack
- Cache Collision Attacks.
  - Prime + Probe Attack
  - Time Driven Attacks
- Transient Micro-architectural Attacks.
  - Meltdown
  - Spectre

### <u>Outline</u>

- Cache Timing Attacks.
  - Cache Covert Channel.
  - Flush + Reload Attack
- Cache Collision Attacks.
  - Prime + Probe Attack
  - Time Driven Attacks
- Transient Micro-architectural Attacks.
  - Meltdown
  - Spectre

### <u>Security</u>

- Cryptography
- Passwords
- Information Flow Policies
- Privileged Rings
- ASLR
- Virtual Machines and confinement
- Javascript and HTML5 (due to restricted access to system resources)
- Enclaves (SGX and Trustzone)

## <u>Security</u>

- Cryptography
- Passwords
- Information Flow Policies
- Privileged Rings
- ASLR
- Virtual Machines and confinement
- Javascript and HTML5 (due to restricted access to system resouces)
- Enclaves (SGX and Trustzone)

Cache timing attack

Branch prediction attack

**Speculation Attacks** 

Row hammer

Fault Injection Attacks

Cold boot attacks

DRAM Row buffer (DRAMA)

### Micro-architectural Attacks

- Micro-architectural attacks are caused by:-
  - Performance optimizations
  - Inherent device properties
  - Stronger attackers



### Cache Timing Attacks









#### **Cache Organisation**

| maverick@maverick-wo | rkforce:~\$ lscpu                         |
|----------------------|-------------------------------------------|
| Architecture:        | x86_64                                    |
| CPU op-mode(s):      | 32-bit, 64-bit                            |
| Byte Order:          | Little Endian                             |
| CPU(s):              | 8                                         |
| On-line CPU(s) list: | 0-7                                       |
| Thread(s) per core:  | 2                                         |
| Core(s) per socket:  | 4                                         |
| Socket(s):           | 1                                         |
| NUMA node(s):        | 1                                         |
| Vendor ID:           | GenuineIntel                              |
| CPU family:          | 6                                         |
| Model:               | 142                                       |
| Model name:          | Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz  |
| Stepping:            | 11                                        |
| CPU MHz:             | 705.790                                   |
| CPU max MHz:         | 3900.0000                                 |
| CPU min MHz:         | 400.0000                                  |
| BogoMIPS:            | 3600.00                                   |
| Virtualization:      | VT-x                                      |
| L1d cache:           | 32K                                       |
| L1i cache:           | 32K                                       |
| L2 cache:            | 256K                                      |
| L3 cache:            | 6144K                                     |
| NUMA node0 CPU(s):   |                                           |
| Flags:               | fpu vme de pse tsc msr pae mce cx8 apic   |
| vscall ny ndneigh rd | tson 1m constant tso art arch perfmon peh |

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe s yscall nx pdpe1gb rdtscp lm constant\_tsc art arch\_perfmon pebs bts rep\_good nopl xtopology nonstop\_tsc cpuid aperfmperf pni pclmulqdq dtes64 m onitor ds\_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4\_1 sse4\_2 x2apic movbe popcnt tsc\_deadline\_timer aes xsave avx f16c rdrand la hf\_lm abm 3dnowprefetch cpuid\_fault epb invpcid\_single ssbd ibrs ibpb stibp tpr\_shadow vnmi flexpriority ept vpid ept\_ad fsgsbase tsc\_adjust b mi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel\_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp\_notify hwp\_act\_window hwp\_epp md\_clear flush\_l1d arch\_capabilities

### **Cache Organisation**

maverick@maverick-workforce:/sys/devices/system/cpu/cpu0/cache/index0\$ cat /proc/cpuinfo processor : 0 vendor id : GenuineIntel cou familv : 6 model : 142 model name : Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz stepping : 11 microcode : 0xf0 cpu MHz : 748.275 cache size : 6144 KB physical id : 0 siblinas : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu exception : yes cpuid level : 22 wp : ves flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe sysc all nx pdpeigb rdtscp lm constant tsc art arch perfmon pebs bts rep good nopl xtopology nonstop tsc cpuid aperfmperf pni pclmulqdq dtes64 moni tor ds cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx f16c rdrand lahf lm abm 3dnowprefetch cpuid fault epb invpcid single ssbd ibrs ibpb stibp tpr shadow vnmi flexpriority ept vpid ept ad fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp notify hwp act window hwp epp md clear flush l1d arch capabilities

bugs : spectre\_v1 spectre\_v2 spec\_store\_bypass mds swapgs itlb\_multihit srbds mmio\_stale\_data retbleed bogomips : 3600.00 clflush size : 64 cache\_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:

| maverick@maverick-workforce:/sys/devices/system/cpu/cpu0/cache/index0\$ ls                 |                                |                           |                                  |          |                       |  |  |  |  |
|--------------------------------------------------------------------------------------------|--------------------------------|---------------------------|----------------------------------|----------|-----------------------|--|--|--|--|
| coherency_line_size                                                                        | level                          | physical_line_partition   | shared_cpu_map                   | type     | ways_of_associativity |  |  |  |  |
| id                                                                                         | <pre>number_of_sets</pre>      | shared_cpu_list           | size                             | uevent   |                       |  |  |  |  |
| maverick@maverick-workforce:/sys/devices/system/cpu/cpu0/cache/index0\$ cat number_of_sets |                                |                           |                                  |          |                       |  |  |  |  |
| 64                                                                                         |                                |                           |                                  |          |                       |  |  |  |  |
| maverick@maverick-wo                                                                       | <mark>rkforce:/</mark> sys/dev | ices/system/cpu/cpu0/cach | e <mark>/ind</mark> ex0\$ cat wa | ys_of_as | sociativity           |  |  |  |  |
| 8                                                                                          |                                |                           |                                  |          |                       |  |  |  |  |
| maverick@maverick-wo                                                                       | <mark>rkforce:/sys/d</mark> ev | ices/system/cpu/cpu0/cach | e <mark>/ind</mark> ex0\$ cat co | herency_ | line_size             |  |  |  |  |
| 64                                                                                         |                                |                           |                                  |          |                       |  |  |  |  |
| maverick@maverick-wo                                                                       | <mark>rkforce:/</mark> sys/dev | ices/system/cpu/cpu0/cach | e <mark>/ind</mark> ex0\$ cat sh | ared_cpu | _list                 |  |  |  |  |
| 0,4                                                                                        |                                |                           |                                  |          |                       |  |  |  |  |

|         |    |  |  |         | 64 Se | ets |
|---------|----|--|--|---------|-------|-----|
| 64 Byte | es |  |  | 32K L1d | Cache |     |



























### Cache Covert Channels: Send Even 1



- Indentifying
  - Cache Covert Channels are difficult
  - Variety of Covert Channels : File, Time etc
- Quantifying
  - Bit rate of communication : bps
- Elimination
  - Careful design
  - Seperation
  - Studying characteristic of operations
    - Rate of opening and closing of files

### Cache Timing Attacks

Flush + Reload Attack



#### Copy On Write

٠

٠

- Child created is an exact replica of the parent process
- Page tables of the parent duplicated in the child
- New pages created only when parent (or child) modifies data
  - Postpone copying of pages as much as possible, thus optimizing performance
  - Thus, common code sections (like libraries) would be shared across processes.



### Copy On Write



Before process P modifies Page 3

### Copy On Write



#### Process Tree



#### **Process Tree** init **Process Tree** SSLEncryption() SSLEncryption() SSLEncryption() **Physical Memory Virtual Memory** Virtual Memory (process 2) (process 1)



### Interaction with LLC



# Interaction with LLC



# Interaction with LLC



# Interaction with LLC



FAST

# Flush + Reload Attack

### Part of an encryption algorithm



### clflush Instruction

Takes an address as input. Flushes that address from all caches clflush (line 8)

Flush+Reload Attack, Yuval Yarom and Katrina Falkner (https://eprint.iacr.org/2013/448.pdf)

### Flush + Reload Attack



### Flush + Reload Attack





### Flush + Reload Attack



Time Slot Number

| Seq. | Time Slots  | Value |
|------|-------------|-------|
| 1    | 3,903-3,906 | 0     |
| 2    | 3,907-3,916 | 1     |
| 3    | 3,917-3,926 | 1     |
| 4    | 3,927-3,931 | 0     |
| 5    | 3,932-3,935 | 0     |
| 6    | 3,936-3,945 | 1     |
| 7    | 3,946-3,955 | 1     |

| Seq. | Time Slots  | Value |
|------|-------------|-------|
| 8    | 3,956-3,960 | 0     |
| 9    | 3,961-3,969 | 1     |
| 10   | 3,970-3,974 | 0     |
| 11   | 3,975-3,979 | 0     |
| 12   | 3,980-3,988 | 1     |
| 13   | 3,989-3,998 | 1     |

# Flush + Reload Attack : Counter

- Do not use copy-on-write
  - Implemented by cloud providers
- Permission checks for clflush
  - Do we need clflush?
- Non-inclusive cache memories
  - AMD
  - Intel i9 versions
- Fuzzing Clocks
- Software Diversification
  - Permute location of objects in memory (statically and dynamically)

# **Cache Collision Attacks**



# **Cache Collision Attacks**

- External Collision Attacks
  - Prime + Probe Attack
- Internal Collision Attacks
  - Time Driven Attacks















Set 0

5.(2

### Prime + Probe Attack

PROBE



### Prime + Probe Attack

PROBE



### Prime + Probe Attack



Each row is an iteration of the while loop; darker shades imply higher memory access time

# Example Prime+Probe: Cryptography

```
char Lookup[] = {x, x, x, . . . x};
char RecvDecrypt(socket) {
    char key = 0x12;
    char pt, ct;
    read(socket, &ct, 1);
    pt = Lookup[key ^ ct];
    return pt;
}
```

The attacker know the address of Lookup and the ciphertext (ct) The memory accessed in Lookup depends on the value of key Given the set number, one can identify bits of key ^ ct.

# Example Prime+Probe: Keystroke Sniffing

 Keystroke -- interrupt -- kernel mode switch -- ISR execution -- add to keyboard buffer -- ... -- return from interrupt



# Example Prime+Probe: Keystroke Sniffing

- Regular disturbance seen in Probe Time Plot
- Period between disturbance used to predict passwords



Svetlana Pinet, Johannes C. Ziegler, and F.-Xavier Alario. 2016. Typing Is Writing: Linguistic Properties Modulate Typing Execution. Psychon Bull Rev 23, 6

# **Cache Collision Attacks**

**Time Driven Attacks** 



### **Time Driven Attacks**



# Internal Collision : Cipher



# Internal Collision : Cipher

Suppose  $(K_0 = 00 \text{ and } K_4 = 50)$ 

- $P_0 = 0$ , all other inputs are random
- Make N time measurements
- Segregate into buckets based on value of P<sub>4</sub>
- Find average time of each bucket
- Find deviation of each average from overall average (DOM)



| P4                                  | Average<br>Time | DOM  |
|-------------------------------------|-----------------|------|
| 00                                  | 2945.3          | 1.8  |
| 10                                  | 2944.4          | 0.9  |
| 20                                  | 2943.7          | 0.2  |
| 30                                  | 2943.7          | 0.2  |
| 40                                  | 2944.8          | 1.3  |
| 50                                  | 2937.4          | -6.3 |
| 60                                  | 2943.3          | -0.2 |
| 70                                  | 2945.8          | 2.3  |
| :                                   | :               | :    |
| FO                                  | 2941.8          | -1.7 |
| Average : 2943.57<br>Maximum : -6.3 |                 |      |

# <u>Questions</u>

**Cache Attacks** 

