986888_cable_2 1. Understand the requirements.

Make sure you understand the requirements before you begin to debug and fix anything. Is there a standards document or a specification to look at? Or other documentation? Maybe the software is not malfunctioning after all. It could be a misinterpretation instead of a bug.

2. Make it fail.

You need a test case. Make your program fail. See it with your own eyes. A test case is a must-have for three reasons:

1. How else would you know that you have eventually fixed the problem if not by seeing that it finally works?
2. You will need a test case to obey rule 13 (“Cover Your Bugfix with a Regression Test”).
3. You have to understand all factors that contribute to making your software fail.

You need to separate facts from assumptions. An environment variable may be a factor, or the operating system, or the window manager being used.

Bug reports share a similarity with eyewitness reports of a car accident or crime: more often than not, facts and interpretation are blended, and key pieces of information may be missing although the witnesses have the best intentions and are convinced that they describe the complete and unabridged truth.

3. Simplify the test cases.

The next step is to simplify the test case. You do this in order to

• rule out factors that do not play a role,
• reduce the runtime of the test case and, most importantly,
• make the test case easier to debug. Who wants to deal with data containers filled with hundreds or thousands of items?

4. Read the right error message.

Something went wrong and you face a screen full of error messages. Which ones do you focus on? It is surprising how many people don’t give the correct answer.

      The ones that come out first!

And that is not necessarily the first one you see; scroll back if need be. Everything that happened after the first thing went wrong should be eyed with suspicion.

The first problem may have left the program in a corrupt state.

So, first things first – fix the problems in the order of appearance, or have a very good reason for breaking this rule.

5. Check the Plug.

Next, check the seemingly obvious. Were all parts of the software up and running when the problem occurred? Permissions OK? Enough disk quota? Is there enough space on all relevant file systems (including the likes of C:\WINDOWS, /tmp, and /var)? Does the system have enough memory?

Think of ten common mistakes, and ensure nobody made them.

6. Separate facts from interpretation.

Don’t jump to conclusions. Maintain a list of things you know for a fact, and why. Ask yourself: “Can you prove it?” Is the behavior reproducible? Is what you consider a fact really a fact? “It fails when I select a blue item but it always works for red items” a bug report may state. So misbehavior depends on the color? Maybe not. It could be that the user selected the blue item with a mouse click and everything else via the keyboard, by specifying its name.

7. Zoom in and conquer.

It may still not be obvious what is causing the problem. Running a memory debugger and normal source code debugging may not solve the puzzle for you. You may have to bite the bullet and face the tedious task of side-by-side debugging comparing data, log files and the flow of control in both versions, concurrently, side-by-side.

8. Match the tool to the bug.

Leave your comfort zone. Debug where the problem is and not where you find it convenient to debug.
Some debugging tools are easier to use than others in a given situation. But not all are equally helpful. It is natural to focus on the tools and processes you feel most comfortable with. Show discipline. Focus on those aspects that are most promising– even though this may entail tedious work or a trip into uncharted territory.

For instance, it is not uncommon for software developers to try to work around the use of memory debuggers. “They produce lots of strange, cryptic output” is one of the frequently heard excuses, even in situations that clearly suggest memory problems, such as intermittent failures and inexplicable random behavior.

9. One Change at a Time

Do not change more than one thing at a time if possible. Then check if it makes sense and, if not, revert back before trying out the next idea. It is good practice to add comments to source code changed during debugging sessions, indicating type and reason of change. Mind that any code change may introduce new problems. Restrict yourself to solving one problem at a time while debugging.

10. Keep an Audit Trail

Often you will have to deal with a problem involving multiple parameters. You need to try out a number of combinations. It is all too easy to loose track of your changes.

Keep an audit trail!

This is especially important in the case of spurious failures. For manual testing, write down what you did, in what order, and what happened. Instruct the program to create log files and print status messages. Once the bug hits, your notes and the logs may be the only information left to correlate the bug to the environment. Spurious failures usually do not hit randomly perse. They are triggered by well-defined but perhaps obscure events, which are not yet known to you.

11. Get a Fresh View

When you are stuck, go and find somebody to talk to. Make sure to draw a clear line between the facts – and why they are facts – and your theories. Chances are good that your theories may be less than perfect.

The process of explaining the situation to somebody else may help you to separate truth from myths. And you may get a fresh view. Needless to mention: it is advisable to talk to an expert. However, non-experts can be quite helpful too, because you have to explain more.

12. If you didn’t Fix it, it Ain’t Fixed.

Occasionally, a bug will just disappear after you modified some statements. Unless you have a good explanation why your fix is effective, you are better off to assume that the bug still exists and will hit again in the future. Your source code change may merely change the environment and thus change the probability for the bug tore-occur.

Even if you have a good explanation, verify that the fix is effective: take your fix out again and check that the bug comes back. Building your program from scratch after putting the change back in may be a good idea too. The dependencies in your build process may not be perfect and, as a result, the object code may not entirely correspond to the sources.

13. Cover your Bugfix with a regression Test

So the problem is fixed . . . today. What about tomorrow?

To make the bug fix last, you should turn your simplified test case (rule number 3) into a regression test. Think of it as a safety bolt. It prevents others with access to the source code base from accidentally breaking a feature you have put quite some work into. Your customers will like it too – few things are as annoying as bugs that keep coming back.

There is no excuse for ignoring automated testing. Granted, effort has to be spent making software testable and maintaining a regression test system. But this is an integral part of professional software development.