Avoid Mutability
Mutability is never desired; only required
There has been a steady trend in programming languages towards increasing immutability. This is because immutability is a powerful tool for simplifying your code and making the system easier to reason about. It should be the default approach that you take.
Unfortunately, most of us are first introduced to objects by the getter / setter pattern. I believe that introducing this as a foundational topic during a formative period inculcates an overemphasis on the value of the approach.
This is similar to my critique of inheritance, which I believe shouldn’t be taught at all in an introductory CS class or at a minimum should be taught after interfaces, and with less time allotted. Because inheritance is more complicated it ends up getting far more focus. This results in a false impression of its utility, but also leaves the student poorly equipped to leverage interfaces. Once interfaces are explained, an equal amount of time to what’s spent learning the rules of inheritance should instead be spent applying interfaces. Using them to write SOLID code and introducing unit testing as a concept. I believe this after reflecting on my own experience learning to code (in high school I viewed interfaces as crappier inheritance necessary since Java didn’t support multiple inheritance) as well from through lines I observed having taught Data Structures to hundreds of students that had just finished their introductory courses.
Why is mutability discouraged though? Let’s consider how an HTTP request may be represented in a deserialized form:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Don't actually parse this way, simplified for example purposes
class HttpRequest {
public string Method { get; set; }
public string RequestUri { get; set; }
public Dictionary<string, string> Headers { get; set; }
public string Body { get; set; }
public string Path { get; set; }
public Dictionary<string, string> QueryParameters { get; set; }
public HttpRequest(string method, string requestUri, Dictionary<string, string> headers, string body)
{
Method = method;
RequestUri = requestUri;
Headers = headers;
Body = body;
var splitUrl = RequestUri.Split('?');
Path = splitUrl[0];
QueryParameters = System.Web.HttpUtility.ParseQueryString(parsedUrl[1]);
}
};
As written, this code has mutability bugs. Notice that QueryParameters
and Path
are extracted from RequestUri
. This means that they are all intrinsically linked. If any one of these fields are mutated, at least one other needs a corresponding change. If we expose setters this way, the setters need an implementation that will propagate the effects to the other fields.
Ok sure, that’s bugged. But if we fix that why is it still bad?
- Without mutability, the bug isn’t possible. Not writing a bug isn’t a justification in favor of allowing the risk to exist! That’s like arguing you shouldn’t have used a seat belt because you made it home without crashing. This is why we have managed memory and smart pointers.
- Our functions have side effects and side effects are lies. If you already know how HTTP works it’s easy to let this slide, you may expect the side effects. But as a general rule, you don’t. Even when you expect them they’re implicit. You have to check the docs and/or the implementation to confirm what actually happens.
- Operations that we may expect to be cheap are now expensive.
The last point is the one I refer to the most often. Take another example, an HttpClient
of some kind. When you create the client it will accept connection info and use that to make requests, but also to keep a connection alive between requests. It is retained between requests because standing up a new connection is often more expensive than an individual request is.
Let’s say we have a long running connection to some other service and we get a config update changing the IP address to use. We could either:
- Make the client mutable and modify the server field.
- Discard this client and create a new one with the updated IP address supplied at instantiation.
All of the above reasons apply here in favor of the second approach, but the last reason is most clear. An HTTP client is just a wrapper around a connection that issues requests. By changing the connection, we’re changing everything about what the client is. Doing this on internal fields isn’t accomplishing anything but obfuscating the degree of semantic and computationally expensive change. We’re not appreciably saving on cost here by keeping the object around.
When is mutability needed?
That raises the question, what about when modifying the underlying object is far, far cheaper than creating a new one? An example is a vector
/ ArrayList
. These are contiguous containers, meaning there is some underlying buffer allocated. If we don’t have a full buffer we can insert a single record at the end. If we were to make this immutable, we’d have to allocate memory and copy over all values! Same for an update, which is an in-place single change if we’re allowing for mutability.
In this case, we should allow mutability. Not because it’s inherently better, it still suffers from the same issues like that it’s unknown if the cheap operation or the expensive full copy operation will occur. We use it because these downsides are worth it for the massive speed increase to the typical case.
As an aside, while copies / memory allocations are very expensive relative to basic operations like addition, a single function call, or a single pointer read, they’re also highly optimized. Reading and writing tends to happen close together, so the CPU is designed with many tiers of caches. Additionally, reading or writing individual values is an illusion. The CPU will read or write fixed sized chunks of data up and down the memory hierarchy.
This means that “Big O” doesn’t tell the full story. For instance, you’ll learn that inserting into the middle of a
LinkedList
(O(n/2)
) is “faster” than into an array backed data store likevector
orArrayList
(O(n)
). In the real world though, even with large container sizes, the impact of cache misses means aLinkedList
is usually slower.
Middle ground approach
This still leaves two objections to address:
- It’s not as big a deal as with a
vector
, but it’s still way slower to recreate everything every time. - Making edits is common, and needing to make a new object every time is too cumbersome because you must copy or move irrelevant data and now the object isn’t abstracting this operation away for me.
For objection 1, this just doesn’t matter in the real world. See my other posts on that topic. However, if you’re still unsatisfied it’s moot because fixing objection 2 also addresses 1.
For objection 2, that’s quite valid. Let’s consider a situation I found myself in with some Rust code where I needed to modify an incoming HTTP request:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
let path: &str = request.uri().path();
// Legacy v1 endpoints violate RFC 3986 by treating URI paths as case insensitive.
// This is wrong, but must be maintained for backwards compatibility with existing clients.
if is_v1_request(path) {
let new_path = path.to_lowercase();
update_path(&mut request, &new_path);
}
fn update_path(request: &mut Request, new_path: &str) {
let mut parts = request.uri().clone().into_parts();
// SAFETY: We built the string and it's put back together from valid inputs.
parts.path_and_query = Some(new_path.parse().unwrap());
// SAFETY: We build the string and it's put back together from valid inputs.
*request.uri_mut() = Uri::from_parts(parts).unwrap();
}
This is a mess, and the object isn’t even fully immutable! However, the Uri
object doesn’t allow mutations (presumably because of the propagation issue). There’s more than one way to do this, but in order to make things simpler overall we’re ok with copies being made in some places that they possibly could be avoided. If the underlying type didn’t support the into_parts
pattern that’s common in rust, it would be even more annoying because we’d have to move or copy over all fields.
The into_parts
pattern is one example of making this simpler. Allow an object to be decomposed into pieces, edit the pieces you care about, and coalesce into a new object. That operation can apply what would otherwise be side effects as part of the normal instantiation (in this case, extracting more fields);
Another common approach is to have a builder-pattern-like mutation. A mutation operation exists, but doesn’t require a mutable object. Instead it takes and consumes the object, returning a “new one”. Under the hood, it can optimize as much as desired. This gets us performance, encapsulation, and better ergonomics.
1
2
3
4
5
6
7
8
9
10
11
12
13
// Hypothetical builder-like mutation that propagates side effects
impl HttpRequest {
pub fn with_path(self, new_path: String) -> Self {
let path_length = new_path.len();
let mut new_path_and_query = new_path;
new_path_and_query.push_str("?");
new_path_and_query.push_str(&self.query)
self.path_and_query = new_path_and_query;
self.path = new_path_and_query[0..path_length];
self
}
}