Memory safety in Rust - part 1
Did you know that around 70% serious security bugs in Chrome and Microsoft's products are memory safety issues. iOS, macOS and Android have similar numbers. Most of these security vulnerabilities occur because these software systems are written in memory unsafe languages like C and C++.
In this first article of a two part series I will discuss the concept of memory safety and explain how it is linked with memory management. In the next part I will talk about memory safety in Rust.
While defining memory safety can be surprisingly subtle, an informal understanding will suffice for this article. In short, a program is memory safe if it does not access invalid memory. For example, in the following C program, writing to
buffer is a buffer overflow bug because it writes past the last byte owned by the buffer:
char buffer; buffer = 'x';
A buffer overread bug occurs when a program reads past the allocated memory. Heartbleed was a buffer overread bug. Buffer overflow and overread bugs are easy to fix. The compiler just has to emit code to check bounds on every array access. While there is a little performance penalty, it is worth the cost.
Another class of memory safety bugs are closely related to how memory is allocated and deallocated. Use after free and double free are two examples of those. Unlike out of bounds access, remedies for these class of bugs are not as simple. To see why, we need to understand manual memory management.
Manual memory management
In C the programmer is responsible for managing the memory. If the programmer has allocated a chunk of memory in their program using
malloc, it is their responsibility to release that chunk by calling
free exactly once. While this is a very simple rule, following it in any non-trivial program is very hard. Although rare, even the Linux kernel developers, who are world's top C programmers, make these mistakes.
Since manual memory management is so tricky, can something be done about it?
Can memory management be automated, such that the burden is taken off of the programmer? Garbage collection is an attempt at that. In garbage collected languages like Java and C#, the programmer is free to allocate memory but has no obligation to free it. The garbage collector keeps track of allocated memory that is no longer used and periodically reclaims it. While it works in providing memory safety, it comes at a performance cost due to the garbage collector running in the background.
Moreover, the programmer has very limited control over when the garbage collector will run, making it difficult for latency sensitive programs like automated trading systems in maintaining the latency under a threshold. This lack of control extends to other areas of the language. For example, programmers also give up most of the control over whether to allocate on stack or heap. Fortunately there is an alternative to garbage collection.
RAII stands for Resource Acquisition Is Initialization. I find this acronym really bad at explaining what it does. The idea is much simpler — when a variable in a program is no longer used, free the memory owned by that variable. If the compiler can do this analysis, then it can insert the appropriate code to free the memory. This idiom first originated in C++ where its implementation piggybacks on the compiler generating calls to the destructor when a variable's lifetime ends.
This approach has a couple of benefits. First, memory is freed much more deterministically — right after a variable holding that memory falls out of scope. And second, this technique is not limited to memory. Other resources like file handles, database connections etc. can also be cleaned up using the exact same approach. This is in contrast to garbage collection, which is responsible only for memory. Other resources in garbage-collected languages still need to be cleaned up manually.
It looks like RAII solves all memory safety problems related to memory management and indeed modern C++ smart pointers, which are based on RAII, almost work. But C++ still falls short. In the next part, I will discuss how Rust does even better than C++ in achieving memory safety.
In this article, I introduced the concept of memory safety. I discussed how some of the memory safety bugs are related to memory management and some techniques to avoid doing manual memory management. In the next part, I will focus on Rust's approach to memory safety.