Post

Naming Conventions and Canonicity

Naming Conventions and Canonicity

Standards are like pizza. Even when they’re bad, they’re still pretty good.

Standards are wonderful. They avoid ambiguity and conflict, they simplify interactions between systems and people. But an under appreciated benefit they provide is that you don’t have to waste time arguing about trivial, subjective nonsense.

Naming Conventions

In the age of IntelliSense and syntax highlighting, naming conventions aren’t as useful as they used to be. But they still matter. Regardless of these aids, capitalization and punctuation have significant impact on the meaning of text. They subconsciously influence how you read.

Moreover, searching through code without syntax parsing is still very common (e.g. find and replace, grep vs IDE refactoring tools like rename. Though you should prefer the latter whenever available! It saves a ton of time and consternation).

Tokens

When I was growing up I used to wonder how people learn so many Chinese characters vs the limited set of letters in English. While it does make typing and legible handwriting harder, ultiminately I was preoccupied by a misunderstanding of how English speakers actually do read.

As you read this sentence you aren’t percieving the letters. Well, unless you’re in kindergarden learning to read. You’re reading, one word, one chunk, one token at a time. All at once subcounsiously. Which is part of why you can so easly read this paragraph despite the numerous mispelled words and omitted characters 😉.

When we think about naming conventions in code we tend to think of them as rules for multi-word phrases or compound names. If we’re really paying attention we may consider how acronyms or even proper nouns come into play. The issue isn’t that there is subjectivity or grey area with the rules themselves, there is a single correct answer to what any name should be once you’ve defined what your unit of tokenization is.

Whether or not people are aware of it they treat acronyms and abbreviations as words, as single tokens too. They may not be consistent with these tokens, but they absolutely perceive them that way. It creates natural, readable names and plays nice with linters too.

That’s what I’d like to convince you of today. Once you adopt this mental model you can’t unsee it. It resolves just one more little annoyance and waste of human energy.

Accepted Conventions

NameSuggested self-documenting nameCommon ExamplesAlternative Names
Camel CasecamelCaseLocal variables, member variables, functionslowerCamelCase1
Pascal CasePascalCaseClass / type names, member variables, functionsupperCamelCase2
Snake Casesnake_caseLocal variables, member variables, functionsN/A
Scream Case3 / Screaming Snake CaseSCREAM_CASEConstantsSCREAM_CASE, SCREAMING_CASE, SCREAMING_SNAKE_CASE
Upper Snake CaseUpper_Snake_CaseAs an alternate to SCREAMING_SNAKE_CASE for constantsN/A
Upper Case (typically single token)TType placeholder in genericsN/A
Lower Case (typically single token)aRust lifetimes ('a)‘N/A’

Special Cases

Some people treat certain tokens as if they’re not tokens, and thus come up with a special rule for them. I find this entirely unproductive, needless muddying of the waters. Having such person to person ambiguity undermines the standard because despite having a ratified style guide we can argue about this dumb topic when it comes time to review code.

Acronyms

The biggest source of conflict comes from what even counts as one? Is ATM an acronym? Strictly speaking yes, but as a matter of practice by common usage absolutely not. It’s morphed into its own pseudo-noun, which is why people tend to use the redundant phrase “ATM Machine”.

Worse yet, style guides for platforms like .NET have different rules for 2 letter vs more than 2 letter acronyms! In .NET the “correct” names for ATMs and IP addresses would be:

  • IPAddress
  • AtmMachine

Which gets doubly confusing for AI and IT. If these are words, it’s Ai and It. If they’re acronyms, then it remains AI and IT.

This is aggressively stupid and I’m willing to bet it was conjured up out of the ether as a post-hoc justifications for someone’s personal proclivity for a handful of two letter examples like IP. But remember, a bad standard beats no standard. If you’re writing public APIs you should do your best to follow this dumb rule.

You do have my support for shaking your first at the clouds each time you have to though.

My suggestion:

A token is a token is a token. If acronyms are tokens, it doesn’t matter if it’s a word or an acronym, you always get the right answer. Two letter word tokens aren’t special, so acronyms obviously shouldn’t be either.

There is never a situation where exceptions make things more readable. At best, it’s neutral and something commonly recognized like IPAddress. But remember, humans read tokens. It’s not an opinion to say that IpAddress and ip_address are more readable. For the former to be more readable, it’s because you’ve been trained into reading the unnatural tokenization by repeated exposure. You have developed Stockholm Syndrome from years of abuse.

Abbreviations

This is a rehash of acronyms. Is TV a word, or an abbreviation? Is it Tv or TV?

My suggestion:

A token is a token is a token… Exact same argument here. It’s only a problem to encounter if we needless introduce sub categories.

Avoidance

Acronyms and abbreviations are best avoided whenever possible. The only time to use them is when it is commonly known and it would be needlessly confusing to deviate from. For example, in rust it’s IpAddr. Using IpAddress or worse InternetProtocolAddress isn’t helping anyone. Remember, the key here is tokenization. If your audience is trained on a specific token, it’s better to just stick with it.

Try to avoid internal ones though, we have autocomplete and big screens. You don’t need shorter names.

Stylized Names

A common mistake people make is to retain stylized names for proper nouns, but this defeats the entire point of having naming conventions. Is this stylized? How? Is it consistent? Will programmers consistently know and apply even if there is one official style?

Take gRPC. That’s the “branded” or “stylized” casing. But you’re exceeding unlikely to see it that way in code. Let’s say we have a generate type for an identity server. Well for starters, the use of an acronym as a special token would itself make for an unreadable mess: gRPCIdentityServer, GRPCIdentityServer

Or just be arbitrary in the best case: gRPC_identity_server or GPRC_Identity_Server or GRPC_IDENTITY_SERVER or gRPC_IDENTITY_SERVER.

This will feel familiar because it’s just yet another application of the acronym rule. What people tend to miss though is the subtle implication: being a proper-noun/branded/stylized name doesn’t exempt it from the rules. When people think of it only in terms of acronyms, they’re promoting the rather odd position that being a proper-noun/branded/stylized name doesn’t exempt it if it’s an acronym/abbreviation but does if it’s not one.

I find that I’m almost universally able to get agreement here, right up until we get to their favorite cutesy exception. Product names come to mind. Or IP address… Moving on:

There is no prevailing common practice here except for some corners of the world that have accepted a fundamental reality: The simplest way to do things? Have no special cases.

Just like with acronyms and abbreviations, why should we complicate things? If the answer is the style is irrelevant, then we can’t mess up, we can’t be confused, we can’t have stupid debates (the whole point of the style guides apart from readability in the first place!).

I don’t see how a serious person could advocate for doing it any other way.

An Example

Consider a product I’ve worked on, WireServer. For starters, it’s an internal facing product with almost no documentation so there really isn’t a prevailing standard. There’s no product page, and in docs + api contracts the idea of is this one token or two is inconsistent.

What is consistent is that we don’t refer to it as two words. It’s not Wire Server. Edge cases exist, but they’re so infrequent so as to easily be labeled typos. So regardless of if it’s WireServer or Wireserver for the product name, in code it’s always one token, and thus it’s:

ConventionWireServer in Convention
camelCasewireserver or Wireserver depending on position
PascalCaseWireserver
snakeCasewireserver

Wrapping Up

In closing, this is important precisely because it isn’t. This is a stupid use of the very expensive time of software engineers. Pick a common standard, clarify these potential edge cases, and move on. Edge cases preferably follow what I advocate for here, but even if you pick a standard I consider dumb, just at least pick one.

The only truly wrong choice here is not to make one, or to have exceptions (beyond accepted abbreviations; so don’t decide your product name is special but other names like McCarthy aren’t).

  1. This should only be used when needing to match the naming convention of a particular context, such as if that’s the name your formatter uses. camelCase should be preferred if UpperCamelCase isn’t being used. Whenever camelCase is used, it should be read as “lowerCamelCase”. Even in upper vs lower contexts, since these are less common (and imo unnatural) you’ll likely find people will still call it by the less precise name (though to be clear, it’s only less precise when you’ve needlessly muddied the waters by having two different kinds of camel case!). ↩︎

  2. As an aside, “upper camel case” is a weird, needlessly confusing name. I’m not sure where it comes from, I typically see it used in low level circles. My blind intuition is that it’s an older thing that has largely fallen out of fashion and/or never took over more broadly. Please just call it PascalCase↩︎

  3. I’ve never seen Scream Case to mean anything other than Screaming Snake Case. A true “SCREAMCASE” is just so unreadable: GRPCIDENTITIYSERVER. SCREAMING_SNAKE_CASE is more clear, but honestly unless you’re writing documentation just SCREAM_CASE is fine and more natural to say in common usage imo. ↩︎

All rights reserved by the author.