One of the most amazing things about earning my pilot certificate was the ways that what I learned as a pilot interfaced with what I knew as a programmer. Things like evidence-based problem solving and data-driven decision making are core tenants of aviation and software development.
A large portion of training, for both the primary rating and the instrument rating, consists of emergency preparation. Flying an airplane is easy when things go right. Flying an airplane gets much harder when there are problems, distractions or emergencies. The skill of a pilot can be tested in these situations. In a similar way, writing an application that runs well under ideal circumstances is the easy part; handling failures is the harder task, and one that programmers must excel at.
This blog post is about excepting handling, through the eyes of my training experience.
Exceptions should be handled at the lowest layer possible.
When a pilot is flying an airplane and experiences a GPS sensor failure, that failure is not usually an emergency. It’s an exception to normal operations to be sure, but most pilots have a bunch of different methods for navigating besides GPS. The pilot will probably report the incident to his dispatcher, and possibly to air traffic control, but will continue on without notifying the rest of the crew or the passengers.
This handling of the exception occurred in the lowest level possible of operations: the pilot is the only one who needs to know. The pilot is both equipped and prepared to handle such a failure and there’s no need to raise the issue with anyone else in the system.
Applications should be capable of handling common exceptions without elevating them to the next level or to the user. Common exceptions might include a failed database connection to one of a multitude of servers, or one memcache machine failing to respond. These issues should be logged, investigated, and invisible to the user. A user should never hear about a database connection failure when there are databases available.
Exceptions should only be raised to the next highest level necessary.
From time to time, problems happen in flight that require a pilot to notify others. For example, a pilot might be notified by a crew member that a passenger has had a heart attack and needs medical attention. The pilot will dutifully notify air traffic control of the problem, and advise them of his planned actions. A pilot does this to ensure that the proper emergency services are available when the aircraft lands. But unless an emergency landing is necessary, the pilot won’t notify the passengers.
Sometimes application layers need assistance to resolve exceptions. A database connection failure might require another layer to provide credentials to the failover machines. A failed payment process might require another layer to gather more information from the user. Users still should not be told of the problem unless they are able to either intervene or unless the exception cannot be handled.
Don’t scare the passengers.
Even the most experienced flier probably fears crashing. If you asked for a short list of the ways they think an airplane will crash, they’d probably list engine failure as a top possibility. Pilots therefore wisely know not to scare their passengers, even in a true emergency. Engine failures are not common, but they do happen; modern jets are designed to fly just as well on a single engine as two engines. An engine failure doesn’t usually even get reported to the passengers, unless the failure is catastrophic.
Even when an emergency has to be reported to passengers, the way and demeanor in which the pilot makes the announcement can do a lot to help ensure calm among the passengers. “We’ve lost an engine, we’re going down” has a very different meaning than “we’ve experienced a mechanical problem and we’re going to return to the airport. This is a precaution only, and we’ll have staff available to rebook you.”
Sometimes an application can’t complete processing of the task it was asked to do. This is a good time to work on what kind of error message to display to the user. Technical explanations, or descriptions of “catastrophic failures” are not appropriate here. Neither is glib jokes (see the Google error message of “502. That’s an error”). Instead, it’s important to explain a) what happened, b) what the consequences are (e.g. “unfortunately you’ll have to reenter your work”) and c) promise that someone will be notified.
But a technical error, error code, or scary message will scare your user. Avoid these. If an exception is totally unhandled and bubbles up the exception message, this is bad. Don’t let this happen.
Anticipate possible exceptions.
It’s impossible for pilots to anticipate every emergency they could ever face, but that doesn’t stop them from training on some common scenarios. Engine failures, fires, unusual attitudes, instrument failures and other scenarios are practiced in training as well as in recurrent training. The decision-making process is also emphasized in regular training programs.
When designing an application, it’s important to think in similar terms about possible error conditions and consider these when designing the application. While it may be impossible to predict every error possibility, it should be possible to capture 90% or more of the most common error cases during design, handle them, and then only be expected to focus on the 10% of cases that are unusual or unexpected.
Exception handling is one of the hardest aspects of application design. The unpredictable nature of failures means they are inherently difficult to handle, but application developers are expected to handle them nevertheless.
If you liked this post on exception handling, there’s an entire chapter on the subject in my new book, Mastering Object Oriented PHP. Write better object oriented PHP today: get Mastering Object Oriented PHP!