Post

Naming Conventions and Canonicity

Naming Conventions and Canonicity

Standards are like pizza. Even when they’re bad, they’re still pretty good.

Standards are wonderful. They avoid ambiguity and conflict, they simplify interactions between systems and people. But an under appreciated benefit they provide is that you don’t have to waste time arguing about trivial, subjective nonsense.

Naming Conventions

In the age of IntelliSense and syntax highlighting, naming conventions aren’t as useful as they used to be. But they still matter. Regardless of these aids, capitalization and punctuation have significant impact on the meaning of text. They subconsciously influence how you read.

Moreover, searching through code without syntax parsing is still very common (e.g. find and replace, grep vs IDE refactoring tools like rename. Though you should prefer the latter whenever available! It saves a ton of time and consternation).

Tokens

I used to wonder how Chinese people learn all those characters vs the limited set of letters in English. While it does make typing and legible handwriting harder, ultiminately I was preoccupied by a misunderstanding of how English speakers actually do read.

As you read this sentence, unless you’re in kindergarden learning to read you aren’t percieving the letters. You’re reading one chunk, one token, one word at a time. All at once subcounsiously. Which is part of why you can so easly read the numerous mispelled words and omitted characters in this paragraph.

When it comes to naming conventions in code, the token of choice is the word and whether or not people are aware of it they treat acronyms and abbreviations as words, as single tokens too. They may not be consistent with these tokens, but they absolutely perceive them that way.

Accepted Conventions

NameSuggested self-documenting nameCommon ExamplesAlternative Names
Camel CasecamelCaseLocal variables, member variables, functionslowerCamelCase1
Pascal CasePascalCaseClass / type names, member variables, functionsupperCamelCase2
Snake Casesnake_caseLocal variables, member variables, functionsN/A
Scream Case3 / Screaming Snake CaseSCREAM_CASEConstantsSCREAM_CASE, SCREAMING_CASE, SCREAMING_SNAKE_CASE
Upper Snake CaseUpper_Snake_CaseAs an alternate to SCREAMING_SNAKE_CASE for constantsN/A
Upper Case (typically single token)TType placeholder in genericsN/A
Lower Case (typically single token)aRust lifetimes ('a)‘N/A’

Special Cases

Some people treat certain tokens as if they’re not tokens, and thus come up with a special rule for them. I find this entirely unproductive and needlessly muddying the waters, which undermines the standard in that despite having a ratified style guide we can argue about this dumb topic.

Acronyms

The biggest source of conflict comes from what even counts as one? Is ATM an acronym? Strictly speaking yes, but as a matter of practice by common usage absolutely not. It’s morphed into its own (stylized) noun, which is why people tend to use the redundant phrase “ATM Machine”.

Worse yet, some style guides, like the .NET one, have different rules for 2 letter vs more than 2 letter acryonyms! By .NET guidelines, it’s:

  • IPAddress
  • AtmMachine

Which gets doubly confusing for AI and IT. If these are words, it’s Ai and It. If they’re acronyms, then it remains AI and IT.

In .NET, remember a bad standard beats no standard. If you’re writing public APIs you should do your best to follow this dumb rule.

My suggestion:

A token is a token is a token. If acronyms are tokens, it doesn’t matter if it’s a word or an acronym, you always get the right answer. Two letter word tokens aren’t special, so acronyms shouldn’t be either.

There is never a situation where exceptions make things more readable. At best, it’s neutral and something commonly recognized like IPAddress. But remember, humans read tokens. It’s not an opinion to say that IpAddress or ip_address is more readable. For the former to be more readable, it’s because you’ve been trained into reading the unnatural tokenization by repeated exposure.

Abbreviations

This is a rehash of acronyms. Is TV a word, or an abbreviation? Is it Tv or TV?

My suggestion:

A token is a token is a token… Exact same argument here. It’s only a problem to encounter if we needless introduce sub categories.

Avoidance

Acronyms and abbreviations are best avoided whenever possible. The only time to use them is when it is commonly known and it would be needlessly confusing to deviate from. For example, in rust it’s IpAddr. Using IpAddress or worse InternetProtocolAddress isn’t helping anyone. Remember, the key here is tokenization. If your audience is trained on a specific token, it’s better to just stick with it.

Try to avoid internal ones though, we have autocomplete and big screens. You don’t need shorter names.

Stylized Names

A common mistake people make is to retain stylized names for proper nouns, but this defeats the entire point of having naming conventions. Is this stylized? How? Is it consistent? Will programmers consistently know and apply even if there is one official style?

Take gRPC. That’s the “branded” or “stylized” casing. But you’re exceeding unlikely to see it that way in code. Let’s say we have a generate type for an identity server. Well for starters, the use of an acronym as a special token would itself make for an unreadable mess: gRPCIdentityServer, GRPCIdentityServer

Or just be arbitrary in the best case: gRPC_identity_server or GPRC_Identity_Server or GRPC_IDENTITY_SERVER or gRPC_IDENTITY_SERVER.

I find that I’m almost universally able to get agreement here, because it’s just another application of the acronym rule. But what people miss is the subtle implication: being a proper-noun, or branded, or stylized name doesn’t exempt it from the normal rules. When people think of it only in terms of acronyms, they’re promoting the rather odd position that being a proper-noun, or branded, or stylized name doesn’t exempt it if it’s an acronym (or abbreviation, or both) but does (or maybe) otherwise exempt it.

To apply my own standard, some of those hypothetical can be consistent and mostly straightforward to follow, and if they were commonly accepted it wouldn’t be worth to objecting to the needless complexity of entertaining exceptions. But given there is no prevailing common practice here (beyond that some style guides that explicitly take the position I’m about to advocate) I don’t see how a serious person could advocate for doing it any of these ways.

The simplest way to do things? Have no special cases. Just like with acronyms and abbreviations, why should we complicate things? If the answer is the style is irrelevant, then we can’t mess up, we can’t be confused, we can’t have stupid debates (the whole point of the style guides apart from readability in the first place!).

An Example

Consider a product I’ve worked on, WireServer. For starters, it’s an internal facing product with almost no documentation so there really isn’t a prevailing standard. There’s no product page, and in docs + api contracts the idea of is this one token or two is inconsistent.

What is consistent is that we don’t refer to it as two words. It’s not Wire Server. Edge cases exist, but they’re so infrequent so as to easily be labeled typos. So regardless of if it’s WireServer or Wireserver for the product name, in code it’s always one token, and thus it’s:

ConventionWireServer in Convention
camelCasewireserver or Wireserver depending on position
PascalCaseWireserver
snakeCasewireserver

Wrapping Up

In closing, this is important precisely because it isn’t. This is a stupid use of the very expensive time of software engineers. Pick a common standard, clarify these potential edge cases, and move on. Edge cases preferably follow what I advocate for here, but even if you pick a standard I consider dumb, just at least pick one.

The only truly wrong choice here is not to make one, or to have exceptions (beyond accepted abbreviations; so don’t decide your product name is special but other names like McCarthy aren’t).

  1. This should only be used when needing to match the naming convention of a particular context, such as if that’s the name your formatter uses. camelCase should be preferred if UpperCamelCase isn’t being used. Whenever camelCase is used, it should be read as “lowerCamelCase”. Even in upper vs lower contexts, since these are less common (and imo unnatural) you’ll likely find people will still call it by the less precise name (though to be clear, it’s only less precise when you’ve needlessly muddied the waters by having two different kinds of camel case!). ↩︎

  2. As an aside, “upper camel case” is a weird, needlessly confusing name. I’m not sure where it comes from, I typically see it used in low level circles. My blind intuition is that it’s an older thing that has largely fallen out of fashion and/or never took over more broadly. Please just call it PascalCase↩︎

  3. I’ve never seen Scream Case to mean anything other than Screaming Snake Case. A true “SCREAMCASE” is just so unreadable: GRPCIDENTITIYSERVER. SCREAMING_SNAKE_CASE is more clear, but honestly unless you’re writing documentation just SCREAM_CASE is fine and more natural to say in common usage imo. ↩︎

All rights reserved by the author.