https://crocs.fi.muni.cz @CRoCS_MUNI
PV286 - Secure coding principles and
practices
Dynamic analysis, fuzzing, and taint analysis
Łukasz Chmielewski chmiel@fi.muni.cz (based on the lecture by P. Svenda)
(email me with your questions/feedback)
Centre for Research on Cryptography and Security, Masaryk University
https://crocs.fi.muni.cz @CRoCS_MUNI
This Lecture
• Today we cover dynamic analysis of source code
• First Jan Kvapil will give a presentation about the project.
– Milan Šorf, Roman Lacko, Štěpánka Trnková, Jiří Gavenda, Tomáš Jaroš, and Antonín
Dufka.
• Resources:
– I will attempt to record the lecture and if it works it should be available around
Wednesday.
– An older (but well-recorded) version of the lecture from 2022 (by P. Švenda):
• https://is.muni.cz/auth/player?lang=en;furl=%2Fel%2Ffi%2Fjaro2022%2FPA193%2Fum%2Fvideo%2FP
A193_02_DynamicAnalysisFuzzing_2022.video5
– Last year (worse quality):
• https://is.muni.cz/auth/el/fi/jaro2023/PV286/um/vi/137055932/
– Materials:
• https://is.muni.cz/auth/el/fi/jaro2024/PV286/um/
2 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI3 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
DYNAMIC ANALYSIS
4 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Static vs. dynamic analysis
• Static analysis
– examine program’s code without executing it
– can examine both source code and compiled code
• source code is easier to understand (more metadata)
– can be applied on unfinished code
– manual code audit is kind of static analysis
• Dynamic analysis
– code is executed (compiled or interpreted)
– input values are supplied, internal memory is examined…
5 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What can dynamic analysis provide
• Dynamic analysis compiles and executes tested program
– real or virtualized processor
• Inputs are supplied and outputs are observed
– sufficient number of inputs needs to be supplied
– code coverage should be high
• Memory, function calls and executed operations can be monitored and evaluated
– invalid access to memory (buffer overflow)
– memory leak or double free (memory corruption)
– calls to potentially sensitive functions (violation of policy)
6 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Techniques used by dynamic analysis
• Debugger (full control over memory read/write, even ops)
• Insert data into program input points (integration tests, fuzzing…)
– stdin, network, files…
• Insert manipulation proxy between program and library (dll stub, memory)
• Trace of program’s external behavior (linux strace)
• Change source code (instrumentation, logging…)
• Change of application binary
• Run in lightweight virtual machine (Valgrind)
• Run in full virtual machine
• Follow propagation of specified values (Taint analysis)
• Mocking (create additional input points into program)
• Restrict programs environment (low memory, limited file descriptors, limited rights…)
• …
7 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
DEBUGGING SYMBOLS
9 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Release vs. Debug
• Optimizations applied (compiler-specific settings)
– gcc –Ox (http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)
• -O0 no optimization (Debug)
• -O1 –g / -Og debug-friendly optimization
• -O3 heavy optimization
– msvc /Ox /Oi (http://msdn.microsoft.com/en-us/library/k1ack8f1.aspx)
• MSVS2010: Project properties→C/C++→optimizations
• Availability of debug information (symbols)
– gcc –g
• symbols inside binary
– msvc /Z7, /Zi
• symbols in detached file ($projectname.pdb)
10 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Stripping out debug symbols
• Debug symbols are of great help for an “attacker”
– key called NSAKey in ADVAPI.dll? (Crypto 1998)
– http://www.heise.de/tp/artikel/5/5263/1.html
• Always strip out debug symbols in released binary
– MSVC: Do not provide .pdb files
– GCC: check compiler flags, use strip command
• Check for debugging symbols
– Linux: run file or objdump --syms command (stripped/not stripped)
– Windows: DependencyWalker
11 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
VALGRIND SUITE
12 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Valgrind http://www.valgrind.org/
• Suite of multiple tools (valgrind --tool=<toolname>)
• Memcheck - memory management dynamic analysis
– most commonly used tool (memory leaks)
– replaces standard C memory allocator with its own implementation and check for
memory leaks, corruption (additional guards blocks)...
– dangling pointers, unclosed file descriptors, uninitialized variables
– http://www.valgrind.org/docs/manual/mc-manual.html
• Massif – heap profiler
• Hellgrind - detection of concurrent issues
• Callgrind – generation of call graphs
• ...
13 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Valgrind – core options
• Compile with debug symbols
– gcc –std=c99 –Wall –g –o program program.c
– will allow for more context information in Valgrind report
• Run program with Valgrind attached
– valgrind <options> ./program
– program cmd line arguments (if any) can be passed
– valgrind -v --leak-check=full ./program arg1
• Trace also into sub-processed
– --trace-children=yes
– necessary for multi-process / threaded programs
• Display unclosed file descriptors
– --track-fds=yes
14 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Memcheck – memory leaks
• Detailed report of memory leaks checks
– --leak-check=full
• Memory leaks
– Definitely lost: memory is directly lost (no pointer exists)
– Indirectly lost: only pointers in lost memory points to it
– Possibly lost: address of memory exists somewhere, but might be just randomly
correct value (usually real leak)
15 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Memcheck – uninitialized values
• Detect usage of uninitialized variables
– -undef-value-errors=yes (default)
• Track from where initialized variable comes from
– --track-origins=yes
– introduces high performance overhead
16 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Memcheck – invalid reads/writes
• Writes outside allocated memory (buffer overflow)
• Only for memory located on heap!
– allocated via dynamic allocation (malloc, new)
• Will NOT detect problems on stack or static (global) variables
– https://en.wikipedia.org/wiki/Valgrind#Limitations_of_Memcheck
• Writes into already de-allocated memory
– Valgrind tries to defer reallocation of freed memory as long as possible to
detect subsequent reads/writes here
17 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
EXAMPLES OF ANALYSIS
18 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI19 | PV286 - Secure coding
#include <iostream>
int Static[5];
int memcheckFailDemo(int* arrayStack, unsigned int arrayStackLen,
int* arrayHeap, unsigned int arrayHeapLen) {
int Stack[5];
Static[100] = 0;
Stack[100] = 0;
for (int i = 0; i <= 5; i++) Stack [i] = 0;
int* array = new int[5];
array[100] = 0;
arrayStack[100] = 0;
arrayHeap[100] = 0;
for (unsigned int i = 0; i <= arrayStackLen; i++) {
arrayStack[i] = 0;
}
for (unsigned int i = 0; i <= arrayHeapLen; i++) {
arrayHeap[i] = 0;
}
return 0;
}
int main(void) {
int arrayStack[5];
int* arrayHeap = new int[5];
memcheckFailDemo(arrayStack, 5, arrayHeap, 5);
return 0;
https://crocs.fi.muni.cz @CRoCS_MUNI20 | PV286 - Secure coding
#include <iostream>
int Static[5];
int memcheckFailDemo(int* arrayStack, unsigned int arrayStackLen,
int* arrayHeap, unsigned int arrayHeapLen) {
int Stack[5];
Static[100] = 0; /* Error - Static[100] is out of bounds */
Stack[100] = 0; /* Error - Stack[100] is out of bounds */
for (int i = 0; i <= 5; i++) Stack [i] = 0; /* Error - for Stack[5] */
int* array = new int[5];
array[100] = 0; /* Error - array[100] is out of bounds */
arrayStack[100] = 0; /* Error - arrayStack[100] is out of bounds */
arrayHeap[100] = 0; /* Error - arrayHeap[100] is out of bounds */
for (unsigned int i = 0; i <= arrayStackLen; i++) { /* Error - off by one */
arrayStack[i] = 0;
}
for (unsigned int i = 0; i <= arrayHeapLen; i++) { /* Error - off by one */
arrayHeap[i] = 0;
}
/* Problem Memory leak – array */
return 0;
}
int main(void) {
int arrayStack[5];
int* arrayHeap = new int[5];
memcheckFailDemo(arrayStack, 5, arrayHeap, 5);
return 0;
}
https://crocs.fi.muni.cz @CRoCS_MUNI
Problems detected – compile time
• g++ -ansi -Wall -Wextra -g -o test test.cpp
– clean compilation
• MSVC (Visual Studio 2012) /W4
– only one problem detected, Stack[100] = 0;
• MSVC (later versions) /W4
– No problem reported (detection moved into PREFast)
21 | PV286 - Secure coding
test.cpp(56): error C4789: buffer 'Stack' of size 20 bytes will
be overrun; 4 bytes will be written starting at offset 400
https://crocs.fi.muni.cz @CRoCS_MUNI22 | PV286 - Secure coding
#include <iostream>
int Static[5];
int memcheckFailDemo(int* arrayStack, unsigned int arrayStackLen,
int* arrayHeap, unsigned int arrayHeapLen) {
int Stack[5];
Static[100] = 0; /* Error - Static[100] is out of bounds */
Stack[100] = 0; /* Error - Stack[100] is out of bounds */
for (int i = 0; i <= 5; i++) Stack [i] = 0; /* Error - for Stack[5] */
int* array = new int[5];
array[100] = 0; /* Error - array[100] is out of bounds */
arrayStack[100] = 0; /* Error - arrayStack[100] is out of bounds */
arrayHeap[100] = 0; /* Error - arrayHeap[100] is out of bounds */
for (unsigned int i = 0; i <= arrayStackLen; i++) { /* Error - off by one */
arrayStack[i] = 0;
}
for (unsigned int i = 0; i <= arrayHeapLen; i++) { /* Error - off by one */
arrayHeap[i] = 0;
}
/* Problem Memory leak – array */
return 0;
}
MSVC /W4
https://crocs.fi.muni.cz @CRoCS_MUNI
Visual Studio & PREfast & SAL
28 | PV286 - Secure coding
test.cpp(11): warning : C6200: Index '100' is out of valid index
range '0' to '4' for non-stack buffer 'int * Static'.
test.cpp(14): warning : C6201: Index '5' is out of valid index
range '0' to '4' for possibly stack allocated buffer 'Stack'.
test.cpp(11): warning : C6386: Buffer overrun while writing to 'Static':
the writable size is '20' bytes, but '404' bytes might be written.
test.cpp(17): warning : C6386: Buffer overrun while writing to 'array':
the writable size is '5*4' bytes, but '404' bytes might be written.
test.cpp(23): warning : C6386: Buffer overrun while writing to 'arrayStack':
the writable size is '_Old_2`arrayStackLen' bytes, but '8' bytes might be written.
test.cpp(26): warning : C6386: Buffer overrun while writing to 'arrayHeap':
the writable size is '_Old_2`arrayHeapLen' bytes, but '8' bytes might be written.
int memcheckFailDemo(
_Out_writes_bytes_all_(arrayStackLen) int* arrayStack,
unsigned int arrayStackLen,
_Out_writes_bytes_all_(arrayHeapLen) int* arrayHeap,
unsigned int arrayHeapLen);
https://crocs.fi.muni.cz @CRoCS_MUNI29 | PV286 - Secure coding
#include <iostream>
int Static[5];
int memcheckFailDemo(int* arrayStack, unsigned int arrayStackLen,
int* arrayHeap, unsigned int arrayHeapLen) {
int Stack[5];
Static[100] = 0; /* Error - Static[100] is out of bounds */
Stack[100] = 0; /* Error - Stack[100] is out of bounds */
for (int i = 0; i <= 5; i++) Stack [i] = 0; /* Error - for Stack[5] */
int* array = new int[5];
array[100] = 0; /* Error - array[100] is out of bounds */
arrayStack[100] = 0; /* Error - arrayStack[100] is out of bounds */
arrayHeap[100] = 0; /* Error - arrayHeap[100] is out of bounds */
for (unsigned int i = 0; i <= arrayStackLen; i++) { /* Error - off by one */
arrayStack[i] = 0;
}
for (unsigned int i = 0; i <= arrayHeapLen; i++) { /* Error - off by one */
arrayHeap[i] = 0;
}
/* Problem Memory leak – array */
return 0;
}
/* Error – still off by one, but not detected by SAL */
for (unsigned int i = 0; i < arrayStackLen + 1; i++) {
arrayStack[i] = 0;
}
Visual Studio & PREfast & SAL
https://crocs.fi.muni.cz @CRoCS_MUNI30 | PV286 - Secure coding
: valgrind --tool=memcheck ./test
==17239== Invalid write of size 4
==17239== at 0x4006AB: memcheckFailDemo(int*, unsigned int, int*, unsigned int) (test.cpp:14)
==17239== by 0x40075D: main (test.cpp:33)
==17239== Address 0x595f230 is not stack'd, malloc'd or (recently) free'd
==17239==
==17239== Invalid write of size 4
==17239== at 0x4006CB: memcheckFailDemo(int*, unsigned int, int*, unsigned in t) (test.cpp:17)
==17239== by 0x40075D: main (test.cpp:33)
==17239== Address 0x595f1d0 is not stack'd, malloc'd or (recently) free'd
==17239==
==17239== Invalid write of size 4
==17239== at 0x400710: memcheckFailDemo(int*, unsigned int, int*, unsigned int) (test.cpp:23)
==17239== by 0x40075D: main (test.cpp:33)
==17239== Address 0x595f054 is 0 bytes after a block of size 20 alloc'd
==17239== at 0x4C28152: operator new[](unsigned long) (vg_replace_malloc.c:363)
==17239== by 0x40073F: main (test.cpp:32)
...
==17239== LEAK SUMMARY:
==17239== definitely lost: 40 bytes in 2 blocks
...
==17239== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 6 from 6)
Invalid write detected
(array[100] = 0;)
Memory leaks detected
(array, arrayHeap)
Valgrind --tool=memcheck
Invalid write detected
(arrayHeap[100] = 0;)
Invalid write detected
(arrayHeap[i] = 0;)
https://crocs.fi.muni.cz @CRoCS_MUNI31 | PV286 - Secure coding
#include <iostream>
int Static[5];
int memcheckFailDemo(int* arrayStack, unsigned int arrayStackLen,
int* arrayHeap, unsigned int arrayHeapLen) {
int Stack[5];
Static[100] = 0; /* Error - Static[100] is out of bounds */
Stack[100] = 0; /* Error - Stack[100] is out of bounds */
for (int i = 0; i <= 5; i++) Stack [i] = 0; /* Error - for Stack[5] */
int* array = new int[5];
array[100] = 0; /* Error - array[100] is out of bounds */
arrayStack[100] = 0; /* Error - arrayStack[100] is out of bounds */
arrayHeap[100] = 0; /* Error - arrayHeap[100] is out of bounds */
for (unsigned int i = 0; i <= arrayStackLen; i++) { /* Error - off by one */
arrayStack[i] = 0;
}
for (unsigned int i = 0; i <= arrayHeapLen; i++) { /* Error - off by one */
arrayHeap[i] = 0;
}
/* Problem Memory leak – array */
return 0;
}
Valgrind --tool=memcheck
https://crocs.fi.muni.cz @CRoCS_MUNI
Sgcheck removed from Valgrind
Release 3.16.0 (27 May 2020)
• https://www.valgrind.org/docs/manual/dist.news.html
• “The experimental Stack and Global Array Checking tool has been re
moved. It only ever worked on x86 and amd64, and even on those it
had a high false positive rate and was slow.”
• Takeaway: Some methods will be too costly or with too much
overhead or with too many false positives (problem to solve is hard)
32 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
(MSVS) _CrtDumpMemoryLeaks();
35 | PV286 - Secure coding
Detected memory leaks!
Dumping objects ->
{155} normal block at 0x00600AD0, 20 bytes long.
Data: < > CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD
{154} normal block at 0x00600A80, 20 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Object dump complete.
https://learn.microsoft.com/en-us/cpp/c-runtime-
library/reference/crtdumpmemoryleaks?view=msvc-170
https://crocs.fi.muni.cz @CRoCS_MUNI
Dr.Memory memory analysis (https://drmemory.org/)
• Can run as standalone tool or Visual Studio plugin
• Targets primarily C and C++ binaries
• Also capable of fuzzing
– Selected separate function from target binary, define fuzzing methodology
– https://drmemory.org/page_fuzzer.html
36 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Tools - summary
• Compilers (MSVC, GCC) will miss many problems
• Compiler flags (/RTC and /GS; -fstack-protector-all) flags
– detect (some) stack-based corruptions at runtime
– additional preventive flags /DYNAMICBASE (ASLR) and /NXCOMPAT (DEP)
• Valgrind memcheck
– will not find stack-based problems, only heap corruptions (dynamic allocation)
• Valgrind exp-sgcheck (removed 27.5.2020)
– will detect stack-based problem, but miss first (possibly incorrect) access
• Cppcheck
– detect multiple problems (even memory leaks), but mostly limited to single function
• PREfast will find some stack-based problems, limited to single function
• PREfast with SAL annotations will find additional stack and some heap problems, but not all
37 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
FUZZING (BLACKBOX)
38 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI39 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What is wrong?
40 | PV286 - Secure coding
Tag ‘ff fe’ + length of COM section
length of comment = length – 2;
strlen(“hello fuzzy world”) == ?
length of COM section == 00 00
length of comment = 0 – 2;
-2 == 0xFFFFFFFFFFFFFFFE == ~4GB
byte* pComment = new byte[MAX_SHORT];
memcpy(pComment, buffer, length);
MS04-028: Microsoft's JPEG GDI+ vulnerability (2004)
https://crocs.fi.muni.cz @CRoCS_MUNI
I love GDI+ vulnerability because…
• Lack of proper input checking
• Type signed/unsigned mismatch
• Type overflow
• Buffer overflow
• Heap overflow
• Source code was not available (blackbox testing)
• Huge impact (core MS library)
• Easily exploitable
42 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
INTRO TO FUZZING
43 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Very simple fuzzer
cat /dev/random | ./target_app
44 | PV286 - Secure coding
What do you miss here?
https://crocs.fi.muni.cz @CRoCS_MUNI
What is missing?
• Where fuzzing fits in development process? (developer side, CI, SDL)
• What type of bugs fuzzing tends to find?
• What apps can be fuzzed?
• How to detect that app mishandled fuzzed input (“hit”)? (crash, signal, exception, error…)
• How to react on detected “hit”? (save seed and crashing inputs, bucketing of inputs)
• How to create more meaningful inputs then random bytes? (valid inputs, proxy)
• How to fuzz non-binary inputs? (string patterns, regexpr, mouse movements…)
• How to fuzz applications without input as files? (http requests, dll injection, ZAP example)
• How to fuzz efficiently? (known problematic values (fuzz vectors))
• How to fuzz files/inputs with defined structure? (grammar, example Peach)
• How to make fuzzer protocol-aware? (Peach example)
• How to fuzz state-full protocols? (proxy like fuzzing)
• How to analyse and react on detected hits?
• Which tools one can use?
• How to detect less visible “hits”? (side-channels)
• What else can we fuzz? (test coverage testing, DDOS resiliency, hardware inputs)
45 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI46 | PV286 - Secure coding
http://iconarchive.com,
http://awicons.com,
http://www.pelfusion.com
1. Investigate app in/out 2. Prepare data model (optional)
3. Validate data model
4. Generate fuzzed inputs
5. Send fuzzed input to app
6. Monitor target app
7. Analyze logs
https://crocs.fi.muni.cz @CRoCS_MUNI
Fuzzing: key characteristics
1. More or less random modification of inputs
2. Monitoring of target application
3. Huge amount of inputs for target are send
4. Automated and repeatable
48 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Fuzzing - advantages/disadvantages
• Fuzzing advantages
– Very simple design
– Allow to find bugs missed by human eye
– Sometimes the only way to test (closed system)
– Repeatable (crash inputs stored)
• Fuzzing disadvantages
– Usually simpler bugs found (low hanging fruit)
– Increased difficulty to evaluate impact or dangerosity
– Closed system is often evaluated, black box testing
50 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What kind of bugs are usually found?
• Memory corruption bugs (buffer overflows...)
• Parser bugs (crash of parser on malformed input)
• Invalid error handling (other then expected error)
• Threading errors (requires sufficient setup)
• Correctness bugs (reference vs. new implementation)
52 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Google’s OSS-Fuzz
53 | PV286 - Secure coding
https://www.usenix.org/sites/default/files/conference/protected-files/usenixsecurity17_slides_serebryany.pdf
https://crocs.fi.muni.cz @CRoCS_MUNI
Microsoft VulnScan
• “Over a 10-month period where VulnScan was used to triage all memory
corruption issues for Microsoft Edge, Microsoft Internet Explorer and Microsoft
Office products. It had a success rate around 85%, saving an estimated 500
hours of engineering time for MSRC engineers.”
• https://msrc.microsoft.com/blog/2017/10/vulnscan-automated-triage-and-root-
cause-analysis-of-memory-corruption-issues/
54 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What kind of bugs are usually missed?
• Bugs after input validation (if not modeled properly)
• High-level / architecture bugs (e.g. weak crypto)
• Usability bugs
• …
55 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What kind of applications can be fuzzed?
• Any application/module with an input
– (sometimes even without inputs, e.g., fault induction)
• Custom (“DIY”) fuzzer
– Usually, full knowledge about target app
– Kind of randomized integration test (but still repeatable!)
• File fuzzer – input via files
• Network fuzzer – input received via network
• General fuzzing framework
– Preprepared tools and functions for common tasks (file, packet…)
– Custom plugins, pre-prepared and custom data models
56 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Microsoft’s SDL MiniFuzz File Fuzzer
58 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI59 | PV286 - Secure coding
<?xml version="1.0"?>
<failures>
<failure type="Exception Event:Tid=8504, 0x80000003, unhandled, address=0x7740e34d" datetime="11:21:12 12. 2. 2015">
<registers RAX="00000000" RBX="00000000" RCX="7FFF5FC5180A" RDX="00000000" RSI="00000000" RDI="00000000" RBP="00000000" RSP="00
<process name="C:\Program Files (x86)\IrfanView\i_view32.exe" />
<file name="-std=c99 -Wall C:\minifuzz\temp\beer-0rsw9!h2jf.jpg" />
</failure>
</failures>
https://crocs.fi.muni.cz @CRoCS_MUNI
MiniFuzz: gcc fuzzing
60 | PV286 - Secure coding
#include<stdio.h>
int main() {
printf("Hello Fuzzy World");
return 0;
}
Binary fuzzing of source code???
How to improve test coverage?
What if file is not command line
parameter?
https://crocs.fi.muni.cz @CRoCS_MUNI
INVESTIGATE APPLICATION
61 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What kind of inputs and strategy?
• Type of inputs?
– File, network packets, structure, data model, state(-less)
• What environment setup is necessary?
– Fuzzing on live system?
– Multiple entities inside VMs? Networking?
• Isolated vs. cooperating components?
– We don’t like to mock everything
• What tools are readily available?
62 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
MODELLING
65 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Input preparation
• Time intensive part of fuzzing (if model does not exist yet)
1. Fully random data
2. Random modification of valid input
3. Modification of valid input with fuzz vectors
4. Modification of valid input with mutator
5. Fuzzing via intermediate proxy
66 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Radamsa fuzzer
• “…easy-to-set-up general purpose shotgun test to expose the easiest
cracks…”
– https://gitlab.com/akihe/radamsa
• Just provide input files, all other settings automatic
– cat file | radamsa > file.fuzzed
67 | PV286 - Secure coding
>echo "1 + (2 + (3 + 4))" | radamsa --seed 12 -n 4
1 + (2 + (2 + (3 + 4?)
1 + (2 + (3 +?4))
18446744073709551615 + 4)))
1 + (2 + (3 + 170141183460469231731687303715884105727))
https://crocs.fi.muni.cz @CRoCS_MUNI
Fuzzing via intermediate proxy
• Fuzzer modifies valid flow according to data model
• Usually used for fuzzing of state-full protocols
– Modelling states and interactions would be difficult
– Target application(s) takes care of states and valid input
71 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
OWASP’s ZAP – fuzz strategy settings
72 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Differential fuzzing
• Basic idea
– Compare results obtained from two (or more) implementations for the same inputs
• Usage scenarios
– Legacy and refactored implementation (additional check atop of unit tests)
– Conformance of independent implementation to the reference one
– Comparison of expected outputs from group of programs
• Solves the issue of missing expected outputs (insufficient test vectors)
– Expected behavior is taken from the other program (reference, majority)
73 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Fuzzing in cryptographic domain
• CryptoFuzz: differential fuzzing of cryptographic libraries
– https://github.com/guidovranken/cryptofuzz
– Provides same input to multiple cryptographic libraries, compare outputs
– The “correct” result is the one by majority of libraries
• TLS fuzzer
– https://github.com/tomato42/tlsfuzzer/
– Verifies correct error handling by TLS server (expected error message)
– Incorrect error behavior can lead to decryption of data or private key extraction
(padding oracle attacks, e.g., https://robotattack.org/)
74 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
APDUPlay - Smart card fuzzing
• Host to smart card communication done via PC/SC
• Custom winscard.dll stub written
• Manipulate incoming/outgoing APDUs
– modify packet content
– replay of previous packets
– …
| PV286 - Secure coding
[RULE1]
MATCH1=in=1;t=0;cla=00;ins=a4;p1=04;
ACTION=in=0;data0=90 00;le=02;
00 a4 04 00 08 01 02 03 04 05 06 07 08
winscard.dll (stub)
90 00
75
http://www.fi.muni.cz/~xsvenda/apduinspect.html
https://crocs.fi.muni.cz @CRoCS_MUNI
VALIDATION
76 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Validation of model
• Are fuzzed inputs according to your need?
– Smarter fuzzing understands a data format
– Wrong data format usually fails early on initial parsing
• Check between fuzzing data model and real input
– E.g., Peach Validator tool
• Are template files providing good test coverage?
– E.g., Peach minset tool
77 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Peach Validator 3.0
78 | PV286 - Secure coding
Model doesn’t match valid input
https://crocs.fi.muni.cz @CRoCS_MUNI
American fuzzy lop
• State of the art and very powerful tool by Google
• High speed fuzzer http://lcamtuf.coredump.cx/afl/
• Sophisticated generation of test cases (coverage)
• Automatic generation of input templates
– E.g., valid JPEG image from “hello” string after few days
– http://lcamtuf.blogspot.cz/2014/11/pulling-jpegs-out-of-thin-air.html
• Lots of real bugs found
79 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
American Fuzzy Lop plus plus
• Relative inactivity of Google's upstream AFL development since 2017.
• Result: AFL++, a fork to Google's AFL aiming at:
– more speed,
– more and better mutations,
– more and better instrumentation,
– custom module support, etc.
– AFL & AFL++: https://en.wikipedia.org/wiki/American_fuzzy_lop_(fuzzer)
• Links:
– https://aflplus.plus/
– https://github.com/AFLplusplus/AFLplusplus
• Google's OSS-Fuzz initiative, which provides free fuzzing services to opensource
software, replaced AFL with AFL++ in 2021.
80 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Test coverage
• Random inputs have low coverage (usually)
– Number of blocks visited in target binary
• Smart fuzzing tries to improve coverage
– Way how to generate new inputs from existing
• E.g., Peach’s minset tool
– Gather a lot of inputs (files)
– Run minset tool, traces with coverage stats are collected
– Minimal set of files to achieve coverage is computed
– Selected files are used as templates for fuzzing
• E.g. AFL & AFL++ fuzzers use compile-time instrumentation + genetic
programming to create test cases
81 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
START, GENERATE, MONITOR
82 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
How to detect “hit”?
• Application crash, uncaught exception…
– Clear faults, easy to detect
• Error returned
– Some errors are valid response
– Some errors are valid response only in selected states
• Input accepted even when it shouldn't be
– E.g., packet with incorrect checksum or modified field
• Some operation performed in incorrect state
– E.g., door open without proper authentication
• Application behavior is impaired
– E.g., response time significantly increases
• …
83 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Peach monitors
84 | PV286 - Secure coding
More at: https://peachtech.gitlab.io/peach-fuzzer-community/
https://crocs.fi.muni.cz @CRoCS_MUNI
ANALYZE
90 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
What to do with hit results?
• Time intensive part of fuzzing
• Not all hits are relevant (at least at the beginning)
– Crashes by values not controllable by an attacker are less relevant
– Crash analyzer:
https://learn.microsoft.com/en-us/microsoft-desktop-optimization-pack/dart-v7/diagnosing-system-failures-with-crash-analyzer--dart-7
– !exploitable https://msecdbg.codeplex.com/ (not available anymore)
• Hits reproduction
– Hit can be the result of cumulative series of operations
• Many hits are duplicates
– Inputs are different but hit caused in the same part of the code
• (Automatic) Bucketing of hits
– E.g., Peach performs bucking based on the signature of callstack
91 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Summary for fuzzing
• Fuzzers are cheap way to detect simpler bugs
– If you don’t use it, others will
• Try to find tool that fits your particular scenario
– Check activity of development, support
• Fuzzing frameworks can ease variety of setups
– But bit steeper learning curve
• If fuzzing will not find any bugs, check your model
• Try it!
• The Top Fuzzing Open-Source Projects (23 in 2024)
– https://awesomeopensource.com/projects/fuzzing
94 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Fuzzing driven development (FDD)
• Test-driven development (TDD)
– Write tests first, only later implement functionality
– Will result in testable code (smaller functions, well defined)
• Fuzzing driven development (FDD)
– Continuous fuzzing of an application
– Structure application to enable and support fuzzing
– Will result in “fuzzable” code (deep penetration into app)
• Google OSS-Fuzz
– Large-scale continuous fuzzing of important OSS projects on Google’s servers
– Can be replicated in your Continuous Integration server
95 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Google OSS-Fuzz:
Continuous Fuzzing for Open Source Software
96 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
TAINT ANALYSIS
97 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Taint analysis
• Form of flow analysis
• Follow propagation of sensitive values inside program
– e.g., user input that can be manipulated by an attacker
– find all parts of program where value can “reach”
• “Information flows from object x to object y, denoted x→y , whenever
information stored in x is transferred to, object y.” D. Denning
• Sinks – attacked final functionality, e.g. system calls
• Native support in some languages (Ruby, Perl)
– But not C++/Java , FindSecurityBugs adds taint analysis for Java
98 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Taint sources
• Files (*.pdf, *.doc, *.js, *.mp3...)
• User input (keyboard, mouse, touchscreen)
• Network traffic
• USB devices
• ...
• Every time there is information flow from value from
untrusted source to other object X, object X is tainted
– labeled as “tainted”
99 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI
Conclusions
• Dynamic analyzers can profile application
– and find bugs not found by static analysis
• Fuzzing is a “cheap” blackbox approach via malformed inputs
• Mandatory reading/watching
– Kostya Serebryany, OSS-Fuzz Google's continuous fuzzing service for opensource
software
– https://www.usenix.org/sites/default/files/conference/protected-
files/usenixsecurity17_slides_serebryany.pdf
– https://www.usenix.org/conference/usenixsecurity17/technical-
sessions/presentation/serebryany
103 | PV286 - Secure coding
https://crocs.fi.muni.cz @CRoCS_MUNI104 | PV286 - Secure coding
Questions