Why you should use UUIDs in your APIs

I'd like to take a moment to dwell on using UUIDs (Universally unique identifier) in your APIs, the good and the bad, and really why you should use them.  Today we'll focus around using these specifically as resource identifiers where we will, for the purposes of this blog post, think of a resources as items like users, groups, etc.  So without further ado... let's take a look at UUIDs and why a good API should take advantage of them.

What is a UUID?

For time's sake, let's just go ahead and quote the Wikipedia definition of a UUID as it will serve our purposes nicely:

A UUID is a 16-octet (128-bit) number.

In its canonical form, a UUID is represented by 32 lowercasehexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens). For example:

123e4567-e89b-12d3-a456-426655440000
The first 3 sequences are interpreted as complete hexadecimal numbers, while the final 2 as a plain sequence of bytes. The byte order is "most significant byte first (known as network byte order)"[1](sec. 4.1.2) (note that GUID's byte order is different). This form is defined in the RFC[1](sec. 3) and simply reflects UUID's division into fields,[1](sec. 4.1.2) which apparently originates from the structure of the initial time and MAC-based version.

The number of possible UUIDs is 1632, which is 2128 or about 3.4 × 1038.

While this identifier may seem long and you may have an overly large user base leading to potential increase in disk and memory usage, the uniqueness is the key, and the randomness and difficulty to guess makes this worthwhile.  Should you need something a bit shorter, you could look at shortuuid and repurpose it to your necessary development language in order to create a much shorter (22 characters in this case) unique ID.

Why should I do this?

The basic argument for this is simple. SECURITY. The concept here is that if you can make an identifier infinitely harder to guess at you've successfully stood up a rather easy leg of your security apparatus.  I'm sure we've all seen that genius method of incrementing numbers to identify your resources (eg. 1,2,3,...1234) but if you've failed to implement further security measures or assume that because your API is "private" you've failed 100% and have left your API open to abuse and ridicule.  Unfortunately... This is nobody's fault but your own if you go the route of simple incrementing numbers.

http://my.api/resource/12345

IS FAR EASIER TO GUESS AND LESS SECURE THAN:

http://my.api/resource/12345

I'm not saying you can't associate or correlate your UUIDs with a sequential numbering system in your database for ease of indexing... just don't expose those to the world and keep those elsewhere.

I'll take a slight detour here to pontificate on this to say that a good number of APIs are improperly and poorly secured simply for reasons that designers don't understand and/or implement proper security or simply live in a world of fantasy (Snapchat API Hack anyone...?) where hackers couldn't possibly be as smart as they are.  I'll talk more best practices for securing an API in the future, but this is an area that will take several posts to cover.

Some of the Downsides

Readability.  Sure, UUIDs make human readability and usability difficult for large datasets, but unless you've got human hamsters running big wheels in the back of your datacenter... This is no excuse to not go this route.

Database speed. Ok, I'll admit this one can be tough at first.  Then you remember that some wonderful people out there have created caches whereby you don't have to hit your database for every API call! It's faster, better, and smarter to cache results and, if you do as I suggested above and associate your UUIDs with numerically increasing (easily indexable) ids, you can cache the UUID with the increasing id and use that for your database calls. So again, this isn't an excuse to not use a better practice!

Overview

So, in short: Use UUIDs in your APIs as resource identifiers! For the reasons of added security and simple obfuscation of our identifiers, we can create a uniquely identified resource with a nearly impossible to guess code that can help you lock down your APIs.  The downsides are slim as we do these things to enhance security and we can mitigate the negative consequences by utilizing caching and id-UUID correlations to speed database indexing and limit calls to your database.