Validation Blind Spots Hurt Real Users
Out Of Date Warning
Languages change. Perspectives are different. Ideas move on. This article was published on April 3, 2011 which is more than two years ago. It may be out of date. You should verify that technical information in this article is still current before relying upon it for your own purposes.
A friend of mine lives on Bonieta Harrold Drive. I live on a Windsor Hill Drive. Both of us have a problem in common, which is that poorly designed software is incapable of accepting the length of our street address. For me, American Express refuses to accept more than “WINDSOR HILL D”, which still arrives at our home. I can’t imagine if my friend ever got an American Express card, since given the maximum length available for an address, he would live on “BONIETA HARROL”. If you live in a place where direction (e.g. NW, SW, SE) matter, not having enough space can be extraordinarily problematic to the proper delivery of mail and packages if there is not enough room for the whole address.
Clearly, these software systems have a design flaw. That design flaw is that the programmers responsible for programming the software assumed that 20 characters (house number and street information) was long enough for a standard address. It’s likely that in the best case, developers picked 20 characters based on some given experience (e.g. they considered all the street names in their own town in conjunction with known house number lengths, and came to an answer) or worse, simply picked a number out of thin air. Real users are worse off because of it.
This problem isn’t limited to address length; it manifests itself in any number of forms. Names are prohibited from having spaces or dashes in them, even though many people have names that DO contain spaces or hyphens. Some people have names that are not even alphabet-based; others have names that are not in Latin-based alphabets. Programmers are also famous for insisting that data be presented in certain formats – phone numbers that have dashes, dots, are formatted like “(xxx) xxx-xxxx” or dates that are “mm/dd/yyyy”, completely ignoring the reality that the rest of the world represents dates differently (usually “dd/mm/yyyy”) and that anyone outside North America probably doesn’t have a ten-digit phone number, and most certainly doesn’t format a ten-digit number like Americans do.
This isn’t to say that there shouldn’t be some limitations on the lengths we accept. But our blind spots about validation can and do harm real users if they’re poorly or incompletely thought out. What might seem like a completely rational limit to us might hurt a real user who needs to exceed that limit, through no fault of their own. How many women are “BETTYJEAN” because their first name “can’t” have a space in it?
Let’s end the blind spots. Here are five steps to take in order to accept valid data from real users.
1. Accept valid data in any form provided by the user.
It’s pretentious of programmers to push off the task of formatting data in a manner we like, just because we don’t want to write a function that formats it for us. Demanding phone numbers in “(xxx) xxx-xxxx” format requires the user to do the work that the system should do for them. Programmers are smart people; they can figure out that phone numbers contain only numbers (props to people who accept letters and convert them, but this isn’t necessary) and strip out unnecessary characters. Once you have the numbers, you can format them any way you want. But don’t push this burden off to the user.
2. Where possible, use well-developed validation libraries.
It’s pretty easy to recognize a valid email address…or so we think. And so we write regular expressions that expect email addresses to have a localpart, a domain, and a three-letter extension. The problem is the resulting regular expression might exclude the following valid addresses:
- email@example.com (numbers, if we don’t explicitly allow them)
- firstname.lastname@example.org (two-letter extension)
- email@example.com (extra prefix)
- firstname.lastname@example.org (two extensions)
The same rule applies to other standard validation we perform: URLs, phone numbers, credit card numbers, social security numbers, driver license numbers, postal codes, and other data points that must follow a format. Almost every framework on earth contains a library that is likely to implement standards we’ve never heard of; use them.
3. Do not place artificial limits on valid data.
This might seem like a repeat of the first point in this list, but it’s not. That point is about format; this point is about length. Placing artificial lengths on valid data is precisely the thing that gets billing statements sent to “BONIETA HARROL” instead of Bonieta Harrold Drive. American Express, for what it’s worth, limits passwords to eight characters. That’s an artificial limit on valid data that makes security worse.
Obviously, certain data fields require lengths to be applied in order for them to be created properly. Though the days of imposing a 255 character limit are long behind us; it is possible to offer a larger limit or, in the case of document datastores, it can be unlimited. And by no means is it a good idea to have artificially small limitations just for legacy support of older applications; if older applications are that horribly broken that they require you to impose artificial and silly limitations, fix them.
4. Do place valid limits on specific data.
Certain data will have valid, and reasonable limits. For example, in the United States, no phone number will ever be longer than sixteen characters (including a country code, parenthesis around the area code, and dashes between the groups). This of course excludes extensions, which can add an unlimited value, but illustrates a point: certain data can be limited in length and scope. Social security numbers are another perfect example of this: they’ll never be longer than 11 characters, including dashes, and what’s more, there are certain standards to what a social security number can contain. Credit card numbers are another example; accepted cards will start with particular digits that can be predictable and validated, and the account numbers will be generally a standard length.
It’s wise and practical to limit, validate and enforce restrictions on data that should be and must be specific. And while format should never be a consideration here, limiting the length, character content and acceptable values most certainly is a legitimate aim of validating specific data.
Validation is a complex art, requiring the developer to think beyond his and oftentimes, his team’s predetermined notions of what is correct and what is acceptable. While format should never be something arbitrarily enforced, correcting and assisting the user in providing data is our responsibility. Making that data entry easy, fast and convenient, while more difficult for us on the backend, makes our applications more useable, useful and desirable.