There is no such thing as bug free software. Only insufficiently tested one.
Let's begin with the idea of a code that has been successfully reviewed by both internal quality assurance team and passed even external code audit, although the reviewers had suggested one change.
int32_t marker_data = 0;
char marker[17] = "MagickMarkerBytes";
int marker_version = 1;
The company had note on this line, that they prefer to add space for the '\0'
byte in string literal, although C99 permits this construction and considers it a valid one (the '\0'
is not pushed in the marker array by compiler).
However, as this was an embedded design, the development team considered the design that consumes less memory a better one.
Plus, the in-house developed static analyzer was silent about the line.
(Note: For those who doubt, this is really valid C code)
The auditing company had also one more note, turned into patch:
diff -rupN a/payload_handler.c b/payload_handler.c
--- a/payload_handler.c
+++ b/payload_handler.c
@@ -38,7 +38,7 @@
marker_data = *(int*)(payload + 196);
}
- if (strcmp(marker, payload + 200) == 0) {
+ if (strncmp(marker, payload + 200, 18) == 0) {
ready = true;
} else {
// not a marker tag - already authenticated command is
Now, the nasty thing is that far too many things about these lines went undiscovered. Can you spot them all?
marker
followed by marker_data
, which might not contain zero - thus efficiently cause strcmp travel far into the payload.These are nearly orthogonal values, it's desired to develop them both at the same time. The behavior during failure is critical, as it is the point where your system is most vulnerable and most dangerous for the rest of the world. It is thus desired to have the system under control even during the failure.
if (check_access() == EDENIED) {
die("access denied");
}
// vs.
if (check_access() != OK) {
die("access denied");
}
// consider ENOMEM
The error handling paths are one of most exposed targets in the program. This has two main reasons:
What side channels are there?
Generally, we would like the systems to be safe and secure. This means:
Still, even if we address those issues, seemingly safe code can lead to catastrophe. History simply indicates that failures are inevitable, all we can do about them is to lower the probability of occurrence.
The layered design splits this into several ideas:
Even if it cannot make your code perfect and impenetrable, it will slow down the attacker. An early detection provides you with time to respond to the attack, identify the attack vector and patch. The attacker might pick an easier target as well, if they loose interest, you win as well.
The buffer overflow above can lead to either exploitation of the vulnerability, or - if used with stack sanitizer, a crash. Crash is something that can be logged. A process core dumped, event log from last few minutes saved. Crash reconstructed, vulnerability found, attacker neutralized. All that thanks to the good design and - especially - logging. It does not help you much if the logger is the first thing to loose.
As a result, you might want to create a backup logger, that will be independent on the main one. Once the backup logger is online, you have successfully reduced one single-point-of-failure. The less there are, the better for the stability of your system.
Some designs are full of SPF, it might be impossible to reduce them. For example, if a user decides to put a sticker with password on their monitor, you can't really expect them to remove it. Anything you can do about is is to reduce the reason why they had the need to write the password on the sticker in the first place.
Reviews are the basic tool to maintain code culture in a team. Though to be effective, they require the reader to be on at least the same experience level as the writer. The main reason is to identify unreadable code and minimize it. That does not mean a less experienced reader couldn't have it's bright moments, after all - if the code is incomprehensible, the less experienced reviewer might simply point out, while the experienced could subconsciously auto-compensate.
This idea can be upscale, a closed source project opened such that it can be observed by many at the same time. It is a big step though, many companies consider this a high risk, aware of insufficient internal processes and afraid to loose their know-how. And even after that publishing their code, there might be a an error still present for many years (shellshock, various ssl-related issues like hearthbleed).
But closing the source and obscuring it is not the solution either. It just complicates security audit and internal reviews. Quite contrary, while it complicates the defense, the attacker still has to perform reverse engineering, core inspection and disassemble the known code (which they probably would even in option zero).
You might (and will) find arguments about tamper-proof devices with secured code and inputs, yet beware - no device is tamper proof if you have sufficient number of pieces to experiment on. And thus, the midcase is often used - do not open the source, open the design.
The design can undergo independent reviews, catching out protocol flaws, while the costs of implementation still keep protecting your investment. Plus, it can increase the net value of your technology, unlike the opposite.
There are several exemplary design failures:
You should stick to the best practices:
// Assume already validated
.A personal sidenote: Anybody claiming something wireless is safe and secure has either poor fantasy, or works as insurance seller.
Gameplay rules:
This rule essentially covers the idea that even if you penetrate system, the resources you had to burn (for example by bruteforcing MAC) did not pay off. And even if the system is penetrated, the attacker just hits another wall. For example, even if they manage to find a buffer overflow, they hit memory fence.
Each layer in the layered security model should add to the total security, not bypass the other layers. This is fairly handy, concerning layers that add capabilities. Consider a router which has a fairly working SSH, however uses old encryption. Even if you get in, you shouldn't be able find a tools that can bypass the ssh lying around (for example telnet server). If you do so, there obviously has not been a layer jailing the faulty SSH away from the rest of the system.
Stick to the principle of least privilege, if a network service does not need to fork a new processes, it shouldn't even have the capability to do so (seccomp()
).
The principle of least privilege provides you with one more guard rule - write permission should be exclusive with execute permission.
This applies to the principle of minimal knowledge as well. Consider a Web application that allows you to send free sms as well as payments payments. If it checks an account balance before a complete authentication, you can easily devise an attack with expensive sms and http status code 402 instead of getting 403 first, effectively turning the site into username oraculum.
(Oraculum is a thought device that allows to gain previously unknown information through a side-channel.
Use what you already have:
This should be only a small code, requiring elevated privileges. It is capable of overrides of the security subsystems. If the application is sufficiently sandboxed, then there is no need for elevated privilege and thus, it does not belong into TCB. Unfortunately very vast in practice. For comparison, study the differences between usual Linux security model and Android.
The goal of privilege separation is to minimize the TCB by constantly reevaluating the need for elevated privilege. And if it is no-longer needed, drop to common account. This is achieved through splitting the process into multiple processes, where the auxiliary ones are no-longer privileged. This, privilege separation is a multi-process IPC based protocol.
Honeypoting is a technique used widely in network security, it essentially helps you observe and steal the attacker tools. Has his previous attack failed? Prepare (firewall) rules and put him in an isolated environment where you log everything.
Although isolation is not all-solving, it definitely helps you stop propagation of the fault, isolate it in contained environment (that you can destroy and restart). This does not apply to honeypots only, you can isolate individual subsystems in non-honeypot setup. Although, there attacks that effectively allow to break it, it is still not easy. (Recall rowhammer and spectre?)
There are several approaches:
Yet sometimes you need to isolate subsystems that need to communicate with each other. For example, WordPress CMS and database. Under normal circumstance, this would not be possible, as WordPress needs a write access to the database. Yet, you can apply mitigation - for example, have two instances of WordPress using the same database, one having write access, hidden behind authenticated service, one for common access. Plus, with recent spread of overlay filesystems, you can even go as far as having a separate filesystem for each Apache instance. The only question is, when and where you adhere to the law of diminishing returns.
KISS is philosophy that can be applied as mitigation as well. Keeping your design simple and avoiding overthinked clever hacks allows for a more thorough code review and auditing, allows to imagine the impact of side effects on the security. This applies for both the code and the overall design.
Plus, simpler code indirectly leads to more readable code and thus, less bugs present (especially due to code review). However, this - as a result - also means that you need to restrain yourself from adding unnecessary features, as this provides more space for the attacker.
Never trust your inputs, as they originate outside your system. This means that you should validate and sanitize the inputs upon every entry to your application. Yet, this does not mean that your internal inputs are safe and have already been validated. Suspect that the attacker has access to your device and that they might have used a bug in the input channel (recall the original example?). Far too many faults propagate along trusted channels.
Distinct languages have distinct memory policies. This has impact on how safe the language is, how error-prone it is. C has a nice property of pointer arithmetic, which is also it's caveat. Given the external parallelism in the language, it is very simple to create unsafe - or even dangerous code.
In that case, one might order the languages as follows: C, C++ (correctly used), Java (VM + memory). And, there is Ada and Rust.
Keep yourself educated on both up-to-date exploits and mitigation. Study the codebase you use, study the security record of components you use. Even the hardware (AMD vs. Intel vs. spectre). Learn from your own mistakes, learn on other people mistakes. Try hacking into device from time to time. Including your own. Use elevator pitches to read security news. Follow well-known security patterns (among other)
There is no such thing as bug free software.
Only insufficiently tested one.
Until there is a new,
insufficiently tested release.