Does This Leak?
The exit states of a function shouldn’t be mysterious
Here’s a question, does this C++ code leak memory?
1
2
3
4
5
6
char* get_name(); // Implemented elsewhere
// ...
std::cout << get_name();
As a refresher, a memory leak occurs when memory is allocated but isn’t freed after it’s no longer needed. In the computer, the memory is already physically there so when we say “allocate” we really mean “update the index of in use memory” and when we “free” memory we really mean “remove that entry from the index”, which marks the space as available for future allocation.
In C++, every new
should have a corresponding delete
. Notice we say corresponding not 1:1. Multiple new
’s may be covered by a single delete
, it all depends on the code in question.
You can have leaks other ways, such as by filling a cache with old session entries and never removing them when the sessions end. In practice though, memory leaks are almost always a raw pointer issue.
The answer is it’s impossible to tell. All you know is you’re getting some raw pointer that represents a name, but there’s no communication over who is responsible for managing that memory.
Pointers are unclear
In general, a pointer can be:
- Owned
- Jointly owned
- borrowed
- Null “owned”
- Null “jointly owned”
- Null borrowed
Depending on the context, a pointer can implicitly shift through these meanings as it is passed around. Consider this example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
struct Foo {
std::string bar;
char* raw_str;
}
char* clone(char* input) {
char* copy = new char[256];
strcpy(copy, input);
return copy;
}
void print(char* text) {
std::cout << text;
}
std::string test = "test";
Foo foo = { test, test.c_str() };
// test owns char* #1 -> "test"
// foo.bar owns clone of test, which owns char* #2 -> "test"
// foo.raw_str either now jointly owns or borrows char* #1
// From context, we would conclude borrow. But notice how there is
// no canonical answer, it's purely an inference
If the program ends here, it’s as if a borrow occurred for foo.raw_str
. Each std::string
will free it’s owned pointer and we will have neither leaks nor double frees.
But what if next we had:
1
test.clear();
Now there’s a problem. foo
has a “dangling pointer”, meaning it points to data (the internal state of test
) that no longer exists. This is why we infer the intention of the author was to borrow, not jointly own the data. We know that the std::string
expects to own the data and will free it when the object is destructed.
But what if I drop the object, and make everything a raw pointer?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
struct Foo {
char* bar;
char* raw_str;
}
char* clone(char* input) {
char* copy = new char[256];
strcpy(copy, input);
return copy;
}
void print(char* text) {
std::cout << text;
}
char* test = clone("test");
Foo foo = { test, test };
Now things are really unclear. Is this jointly owned or is test the only owner and foo
has two borrows? The fragility of the original inference is lain bare. Reasoning something is a borrow from context is circular. Either of these two next snippets could be valid:
1
2
3
4
5
// Borrowed
char* test = clone("test");
Foo foo = { test, test };
print(foo.bar);
delete test;
1
2
3
4
5
6
7
// Jointly owned
char* test = clone("test");
Foo foo = { test, test };
test = nullptr;
print(foo.bar);
// Assume eventually foo and test are dropped, and the last dropped calls delete
Both rely on a presumption of the author’s intent. Given the first two lines alone, I can’t possibly answer. Given the other lines, I can only answer by assuming it’s written correctly if it seems like it runs fine. A “valid borrow” scenario with a bug will look like jointly owned and vice versa.
This is why you will frequently see this in C++ code bases:
1
2
3
if (ptr != nullptr) {
delete ptr;
}
This is an abomination. Any time you see this code, the author has loudly declared I don’t know what this pointer is. It’s hard to blame them though. You really have to squint to follow the memory ownership here even in these trivial examples. Across thousands of lines of code, across shared libraries, written by dozens of authors? Good luck.
Calling a null check an abomination might seem unfair or extreme. What if I’m writing a vector? My buffer might be null because I don’t need to create a buffer until something is actually inserted. That’s true, but your vector must also be tracking its buffer size. Unless you’ve written a bug, you always know if the pointer is null or not because when
capacity
is 0 you necessarily have anullptr
. You can argue that writing the if statement againstcapacity
instead of the buffer pointer is semantics or meaningless, but I contend it’s more clear. More importantly, it’s what enables me to take this strong stance on the general case.
The solution? Proper types
The issue here is that we have different scenarios all represented by the same type. What we need is to make as many types as we can, then rely on the compiler (or static analysis tools) to enforce that the semantics conferred by our chosen type are respected. This is accomplished to different degrees of success in different contexts:
Scenario | C++11 | Rust |
---|---|---|
Owned | std::unique_ptr | Box |
Jointly Owned | std::shared_ptr | Arc |
Borrow | std::unique_ptr& OR std::shared_ptr& | &Box OR &Arc |
Nullable Owned | std::unique_ptr | Box |
Nullable Jointly Owned | std::shared_ptr | Arc |
Nullable Borrow | std::unique_ptr& OR std::shared_ptr& | &Option<Box> OR &Option<Arc> |
It is highly regrettable that the C++ standard library only offers smart pointers that are allowed to be null. Offering nullable smart pointers makes sense because one of the goals of the smart pointers is the ability to get access to the underlying raw pointer to perform borrows or clones to remain compatible with legacy code. Most modern C++ isn’t actually interacting with legacy code though. To the extent it is, it should form a clear and concise barrier in the same way we create RAII wrapper objects for raw pointers / handles from C APIs. The standard library should have offered nullable and non-null variants of each smart pointer.
In C, you don’t have RAII types so you don’t get this concept. You can use macros to at least try to name things right, but if you want enforcement you need something like SAL Annotations. In SAL, you can sort of define ownership through annotations.
In closing, don’t use raw pointers. They’re bad for readability, bad for reliability, and bad for security.