Post

Deeply Unserious Solutions

Deeply Unserious Solutions

Wishing upon a star that people are better than they are is a terrible solution every time. - CGP Grey

When someone proposes a solution to a human based problem I like to ask myself “will this work at scale”? Meaning that it should continue to address the issue in stride as time progresses, the problem size increases, or the number of people increases. Beyond cost it’s also a question of efficacy at scale.

Let’s imagine a system is successful 99% of the time. Sounds pretty good right? Usually it is, but it becomes an issue if the system is critical or used often enough. Your car is designed to have a greater than 99% chance of starting. A car not starting can be a matter of life and death. Cars are also started frequently, a car that only starts 99% of the time is a car that will break down monthly with typical usage.

Similarly, consider with software. You go to save a profile edit and 99% of the time it works. If a few hundred people a day are performing this task only a handful will experience an error. They can just retry and it’s whatever. The cost of figuring out why it fails 1% of the time is lower than the cost of just eating the failure. But what if this operation occurs billions of times a day? Now that failure rate is totally unacceptable as you’ll be seeing millions of failures a day.

No Matter What They Tell You, It’s a People Problem

Things aren’t always as obvious when it comes to people. When we deal with people, we deal with emotions, ego, and cognitive bias. Individuals are seen as possessing unique qualities and having intrinsic value. When we want to solve problems, there’s a strong bias to deploy solutions in this framing. We are inclined to avoid quashing individuality.

I see this as a failure of compartmentalization. There are areas in life where these concepts should be prized and protected. There are others where they clearly shouldn’t. Let’s consider an assembly line that inserts a light bulb into a flashlight. It’s pretty uncontroversial to say this process should be systematized. There’s no room for or value to individuality here. It’s a liability. There’s no craftsmanship or artistry here, there’s a single correct way to perform this step.

As a result, we long since realized that assembly lines should be highly systematized be it by process / checklist or literal automation. This has increased output, reliability, scale, and output complexity. A serious person wouldn’t argue the opposite here, but otherwise serious people can unwittingly make this same mistake in areas that feel more creative.

The Dichotomy of Procedure

Flying a plane is an intractable problem. We simply can’t create a centralized regulatory group that can enumerate precise procedures to account for all possible scenarios a pilot could encounter in the air. We can’t “automate” them, they will still need to make decisions. In the American military, it’s understood that the generals can’t possibly preselect the perfect plan. As a result, immense amounts of responsibility is delegated to so called NCOs on the ground, who are empowered and encouraged to seize the initiative.

Both examples rely heavily on checklists and standard operating procedures (SoP). In simpler, more specific situations there is a right thing to do. These procedures may even be nested to form more complex schemes. As an example, in aviation the use of the word immediate is considered emergency phraseology. If an air traffic controller tells a pilot to make an immediate turn, they need to do it now (as long as it’s safe to do so). However, the plane itself has its own warning systems that can issue instructions. Some of these are lower priority than those from the controller and some are higher.

One such system is TCAS. TCAS uses radar and plane to plane communication to detect when planes get too close in the sky. If their paths are dangerous, the system will first issue the verbal warning TRAFFIC, TRAFFIC. At this time, that hypothetical command from the controller still applies. immediate was an emergency order, this is just a warning. If the planes continue to get closer and cross the minimum separation distance, TCAS will escalate to a “Resolution Advisory”. These are emergency commands, such as CLIMB, CLIMB, and they outrank what the controller has ordered.

A controller is human and performing a far more complex task. TCAS is about as systematized as it gets. It’s a fully automated system based on measurements, it almost never fails. In a quick search, the only accident I could find was a near collision when the system behaved sporadically. Whereas every mid air collision listed with TCAS equipped planes either had the system disabled or the pilots disobeyed its instructions (sometimes because of a conflicting instruction from the controller).

There’s a clear balance to be struck here. We want to systematize where it’s possible while giving the human in the loop the ability to choose between processes and apply other actions as needed.

Jidoka

The TPS principle of Jidoka can be translated as automation with a human touch. That’s precisely what we do with pilots. Their SoP’s are massive, thorough, and strict. There’s an incredible amount of information addressing the overwhelming majority of scenarios a pilot will ever encounter.

However, it’s still the pilot’s role to:

  1. Determine which procedure to apply based on the spirit of the SoP
  2. Address conflicts
  3. Deviate from SoP in face of a novel scenario

Even Deviation is Systematized

Pilot’s aren’t given cart blanche though. The SoP isn’t a set of suggestions, it’s the best approximation of a complete system we’ve managed to muster thus far. Deviations are to account for unseen scenarios, not for mere disagreements or personal preference. When pilots do deviate an investigation occurs. The investigation considers the scenario and if the SoP failed to account for it. The investigators prospectively consider if the pilot’s behavior was in line with what a reasonable person would have done.

When it’s determined that the behavior wasn’t warranted, a punishment may occur. Interestingly though it will only in the case of bad faith actions or gross negligence. If the behavior was reasonable, the pilot is cleared. Either way lessons are drawn and distributed to other pilots to learn from. The review will also include recommended changes to further enhance the SoP.

This is sometimes called Just Culture and it’s extremely powerful. When people are punished for reporting errors they learn not to report them, and that leaves us all worse off.

Serious Series: Serious Solutions 🥊

The best solutions are fully automated ones. This means the desired result happens every time and at effectively no cost. Code linters and formatters are excellent examples of this. You can go very deep with this too. In rust there are common naming conventions for when a function should be prefixed with into or as. The linter has heuristics that catch most of these semantic violations.

Other concepts have canonical traits used to represent a platonic ideal in a standardized way. Instead of a “copy constructor” there is the trait (i.e. interface):

1
2
3
4
5
6
7
pub trait Clone: Sized {
    // Required method
    fn clone(&self) -> Self;

    // Provided method
    fn clone_from(&mut self, source: &Self) { ... }
}

The correct way to express this concept in rust is via this trait, not with a different function name or the same name on the struct directly. Even something more complicated like this can be enforced with automation if your linter is good enough. Clippy offers the non_canonical_clone_impl lint for this purpose.

Next best are those that deploy Jidoka. Some level of automation is employed, with a human touch where needed.

Really, even the linter is an example of this because you can suppress lints that are triggered inappropriately. No system is truly 100% automated, but in practice it’s easier if we distinguish between things that are “automated” with a human operator doing maintenance or investigating faults vs a “semi-automated” process that acts as a tool for the person to use.

In software these are typically guided forms or checklists. AI code review is a newer example. Your PR is scanned and the AI attempts to find issues. These are not like the linter rules. The types of inferencing the AI is doing are more complicated and more subjective. It’s going to get things wrong far more often. This is an iterative tool the author can respond to or dismiss.

Solutions That Don’t Scale Are Deeply Unserious

This is why I describe any proposal that relies on people to just “do the right thing” or “not mess up” as deeply unserious. This is admittedly loaded language. Strong language is warranted here because it can reach two groups of people. Those who aren’t engaging with the issue as deeply as they could be, and those that aren’t engaging with it at all.

Activating Deeper Thought

When it comes to code style, formatting, or conventions I’ll often reply to feedback by saying “If it’s not in the linter, it’s not a real rule”. I used to say something similar in that if it’s not documented it’s not a real rule, but I’ve since learned even that’s not enough. Formal policy documentation does prevent tribal myths from forming but doesn’t manifest the policy in reality.

In rust, these are equivalent imports:

1
2
3
4
5
6
7
8
9
10
11
12
// Sloppy
use foo::bar;
use foo::{
    cow
};

// Consistent option 1
use foo::bar;
use foo::cow;

// Consistent option 2
use foo::{bar, cow};

Choosing between 1 and 2 is a subjective matter, but they’re both superior to the randomness of the first one. Sloppy items like this come into a code base when different dev tools for auto importing are used. Since rust’s formatter doesn’t support this option in the official release yet, there’s no way to pick and enforce a style. When people ask me to fix this, my answer is no. It’s a waste of time. It will never be consistent in the code base.

There are thousands of imports from dozens of people. Each person has a slightly different idea of what the nesting rules should even be. We’re manually making and enforcing these changes. So even if we all manage to agree on the right way to do it, we’re still lying to ourselves. It’s always going to be an aspirational target, never a reality.

Instead of wasting time on this we should just accept it. We should be honest about that there is no standard. Being honest in this way reduces toil, but also inspires people to find a better solution. A junior developer on my team wasn’t happy with this state of affairs and went to find a solution. They learned the rust nightly channel does support this option in the formatter and they’re looking into switching our format CI step to it.

Dealing with Bad Thinking and Bad Actors

Systematized solutions are often unpopular. Most of the time people are just making a mistake or having a failure of perspective, but you’ll often encounter spurious arguments in resistance for other reasons.

Before continuing, obviously no proposal is beyond reproach. Genuine criticism or opposition can be had on the grounds of:

  1. Cost relative to benefit
  2. Prioritization
  3. Flaws in the solution that undermine the purported benefits
  4. Specification is intractable

What isn’t valid however, is to implicitly suggest that systematization itself is to be avoided. Outside of artistic expression, we never actually want this flexibility. We’ve just learned that some problems are so intractable that flexibility is required. This can be because it’s impossible to pre-ordain all possible scenarios or because the individual problem relies on some form of applied experience and human intuition (at least today).

Even with artistic expression, it’s far more systematized then we often appreciate. Why else does music theory exist? Even abstract paintings have principles. Average people who aren’t trained in painting can reliably pick out fake abstract paintings from real ones.

Why might someone object?

Motivation 
Artistic SpiritThis isn’t meant to be systematized. Something is lost when we do so.
EgoOffended at suggestion they’d benefit from safety rails; being unwilling to take the “easy”
path that invalidates some asymmetric skill or knowledge they possess (being the big fish
in an obscure pond).
HobbiesThey just like doing a weird, tedious, error prone thing by hand. This is super common
among low level tech people.
Dunning-Kruger EffectThe person is great and expects too much of others or just isn’t as smart as they think they
are. For what it’s worth, in my experience people that object on this topic are usually in the
second group.
FearSuspicion of something new or concern over ability to keep up.
StubbornnessYou can’t always teach an old dog new tricks.

What they all have in common is a lack of workable alternatives or an inaccurate insistence that things are fine the way they are. For clarity, if there’s an explicit problem with a proposed solution, you can attack it without having an alternative of your own. It would be a fallacious appeal to ignorance to say otherwise. When I say lack of workable alternatives, I mean that rather than attacking a position as unworkable they express a preference for an alternative of their own that isn’t a real solution.

Real World Examples

These are all examples I’ve encountered in the wild:

  1. Automated testing can have false positives, so we should be manually stepping through code in the debugger every day instead.
  2. We shouldn’t use rust, people should write better C/C++ code and review for memory bugs more thoroughly.
  3. We should be able to execute arbitrary commands on servers in production and trust engineers to do the right thing.
  4. Locking down testing in production for security will just harm reliability
    • This is true, but the solution is to improve the test infrastructure to get you both.

A common theme here is trust. Trusting people to have good intentions and trusting them not to make mistakes. Neither is a good idea.

How to Respond

Start by trying to sus out why they’re objecting. Is it a valid concern, but driven by misunderstanding? Then work to correct it. Is it an invalid, but still understandable concern or point of discomfort? Work to assuage that human concern.

If it becomes clear that the person is unmovable without cause, then at a certain point you just have to disengage and focus on the others. Don’t worry about winning every holdout over. Demonstrate through reason, data, and reference to standards why the action should be taken and this person’s objections aren’t valid.

All rights reserved by the author.