Being Careful With UUID Comparisons
They’re powerful and easy to use, but be careful when comparing their string representations.
Background
UUIDs are useful – the id space is large enough that, if ids are generated by some standard algorithm, they are effectively globally unique. In AdTech, for example, mobile device identifiers are commonly implemented as UUID. They can then be guaranteed unique across all devices but still be reset locally without updating some remote, central registry.
When marshaled to the canonical 8-4-4-4-12
pattern of hyphen-delimited hexadecimal characters, these 128-bit values are really quite clean to read (e.g. EA7583CD-A667-48BC-B806-42ECB2B48606
). And, in my experience, these tidy, human-readable strings are the most popular mode of UUID exchange.
A Nuance
Despite that easy legibility, there’s also a simple pitfall to beware:
Hexadecimal encodings regularly comprise upper or lower casings for alphabetical characters A-F.
Consider these mobile device identifiers for Android (AAID) and iOS (IDFA):
IDFA="EA7583CD-A667-48BC-B806-42ECB2B48606"
AAID="cdda802e-fb9c-47ad-9866-0794d394c912"
Both are valid examples of canonical UUID strings.
The Pitfall
Suppose a naive implementation of a UUID comparator in Golang.
If, say, you were implementing an opt-out filter to remove device ids from your targeted advertising campaign, then you might get some false negatives by surprise! For example, a candidate UUID EA7583CD-A667-48BC-B806-42ECB2B48606
would not match the UUID in your filter blacklist ea7583cd-a667-48bc-b806-42ecb2b48606
, although they represent the same 128-bit UUID.
Some Alternatives
To avoid the accident above, one approach you might try would be to normalize the strings to all use consistent casing:
Or, better yet, just parse the string representation and compare the unencoded bytes for even better integrity. With a nice library, it’s both easy and clean:
Conclusion
In practice, if the UUIDs you process are sourced from the same origin, they’ll mostly likely be consistent in representation (exclusively using upper case or exclusively using lower case). So even if you goof and introduce a potential bug through a naive UUID comparison, you would probably get away unscathed.
But if the UUIDs you process are sourced from multiple origins, then handling the UUIDs more robustly just might save your butt. After all, one cannot take for granted that a representation being valid and canonical means there are no practical variations to be checked and controlled against.