An Unambiguous ID code character set

Sam Rogers
3 min readNov 14, 2019

Let’s say you want to generate an identification code for something. Could be a product, a discount, or any other kind of ID. But you want it to be alpha-numeric (both letters and numbers) so you can fit a lot of variation into a small code. (For instance, I need this for course coding schemas in Learning Management Systems)

For simplicity, let’s go with the common QWERTY English character set. Yes, there are other letters we could use, but ñø let’s not use them or any other characters requiring modifier keys. Also, things that tend to confuse or break scripting and formatting would be good to avoid, so no brackets or slashes or punctuation.

Sounds straighforward enough. Here are the numbers:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9

So far so good, right?

But then we come to letters and things get a little more confusing.

First off, there’s the whole upper-case / lower-case thing. That’s a level of complexity we just don’t need, so pick one and just do that. Doesn’t matter which, just be consistent.

But then there are those letters that are easily mistaken for numbers. Some fonts make this more difficult than others, but even so…

0 looks a lot like O
1 often looks like l
S gets confused with 5
2 can read as Z
B and 8 look too similar
U and V are an issue

So what if we get rid of all the confusion and ONLY use characters that don’t look the same at all, no matter what font you use?

While we’re at it, how about we eliminate the letters that dyslexic people so frequently dyslex?

d and b are a common mistake, don’t use both
q and p as well, for the same reason as above
E and 3 are also problematic, pick one to use

And then there are the characters that are easily confused with operators:
X is also a multiplication symbol, and a way of hiding/omitting characters
t often looks like + for addition, and † which is a footnote marker

Finally, as a further restriction on our QWERTY character set, let’s avoid letters like X and Q don’t appear in certain alphabets, such as Turkish. Did we already subtract those? Right, we did. Okay, well then…

This would leave us with 15 characters:
a, c, d, f, g, h, j, k, m, n, p, r, u, w, y
or, if you decided to go all caps:
A, C, D, F, G, H, J, K, M, N, P, R, U, W, Y

Either way, we have a total of 25 alpha-numeric characters to work with, and they’re far less likely to be get confused for each other now. I use these for codes, and I don’t use anything else unless a client outright forces me to do so. May I humbly suggest that you try this and see what you can get away with?

Here they are again:

0 1 2 3 4 5 6 7 8 9 A C D F G H J K M N P R U W Y

There! Life just got easier. You’re welcome.

Now, I don’t think I’m all that smart. I am quite confident that I am not the first person in history to think of this. I’m sure it’s well-documented somewhere, and probably even standardized. Maybe there’s a version of this that includes special characters or something?

Here’s the real reason I wrote this up and shared it:
WHY DID I HAVE TO CREATE THIS?!?

Seriously, this should be known and findable by each and every one of us with internet access these days. Maybe it’s out there somewhere, but I sure couldn’t find it. And everyplace I asked around over the last year, people looked at me like I was crazy. Why isn’t this the most basic rule that everyone follows for making IDs for anything?

In other words, why would we continue to do things the stupid way when the smarter way is so gosh-darned easy?

Perhaps this is one of life’s eternal mysteries, but if you happen to have an answer, a link or the real name for what I’m certain is not my invention, or a suggestion for how to improve the system above, please use the comments as you see fit. Thanks!

--

--