Just as there are patterns in software design, there are patterns to problem solving and debugging systems.
Design for Manufacturability calls for considering the cost and complexity to manufacture a product in its design. In software, we should design for debuggability, a force multiplier that impacts many other areas:
- Problem solving during initial development and test.
- Proactively detecting issues, maintenance of production systems, speed and accuracy of problem resolution.
- Future extensions of the system. Many bugs come from later changes by less familiar staff.
Keep Your OODA Loop Short
Developed by a fighter pilot who influenced the design of the F-16 and F/A-18, the Observe-Orient-Decide-Act loop describes a decision making cycle. I will leave the details to Colonel Boyd, but suffice to say a shorter OODA loop is a competitive advantage in many endeavors.
In software problem solving, our loop is the process of observing a bug, forming a hypothesis, considering next steps, making a change, and observing again. The longer this loop takes the worse off you will be.
Things like long build times, manual deployment steps, and other discussions in this article impact your OODA loop. Protect the efficiency of your OODA loop from the start of a project.
Incidentally, methodologies such as Agile and TDD have the effect of shortening your loops.
Means of Observation
Build in a means of observing a running system as part of the design. Be sure you can run the system locally as much as possible with a debugger attached.
Logging, dashboards, performance metrics, and alerts are other common patterns. I’m an opinionated logger:
- Do not let “oh ignore that, it’s ok” exception junk to accumulate. It creates noise that obscures real issues. Do some 5S.
- Logs need timestamps, stack traces for exceptions, class names, and code line numbers.
- MyCustomEnterpriseLoggerAbstraction better not show up in every stack trace. I want the true stack trace of an exception.
- Understand how to use log levels (Error/Warning/Info/Trace). If you are logging the start/end of every method at at the Info level, you are logging too much (and probably need a better debugger). Maintain a good signal to noise ratio. Make sure you can selectively enable Trace logging feature-by-feature.
- Sometimes you have to add a lot of logging to diagnose an issue. After the issue is resolved, review the logs you added and consider removing or lowering their level to Trace.
- If you report errors somewhere beyond a log file, unique error codes can be worth the time. It is easier to build automated support systems from codes than from dynamic text.
- If you expect serious amounts of log data, find tools and log formats in advance that help you filter and search.
- Do not expose security sensitive information in your log and do an inspection in the test plan.
Build and Maintain Test Environments
Have a test system separate from production. If you have multiple configurations (common with embedded devices), create and maintain test system configurations that mirror the systems you have in production. When a problem report comes in, you will want to be able to reproduce it in house on a machine right in front of you (or better yet, a VM on your own system). This is for the safety of your production system and to shorten your change/observation time.
Which of these code fragments would you rather deal with when told there is a Null Reference Exception on line 1?
var result = repository.getCustomer(session.customerID).sendInvoice();
var custID = session.customerID; var customer = repository.getCustomer(custID); customer.sendInvoice();
Just ask Uncle Bob.
Beware of Frameworks
Beware of frameworks that help you develop something initially but interfere with your ability to observe the running system. The cost/benefit is not always worth it. If you have a large number of routine SQL queries then an object-relational mapper can be your friend. If you have many complex queries and edge cases that same framework can become your mortal enemy.
Before using a framework, determine how well you can peek under the hood when needed. Can you see the raw SQL your ORM produces? Can you override it or get around it? Does the framework interfere with your tooling (debugger)? Can you still step through your code stack without getting lost? Can you insert traces? When an event fires, can you tell why?
Understand the environment that motivated the creation of a framework and ask if your environment is the same. If you have a large number of routine tasks, a framework is often a net benefit. If you expect a lot of complexity and edge cases, think twice.
Featured Image Credit
Photo by Matt Hecht