The Art of Debugging: Unveiling the Problem Solving Process

Walter G.

7 minute read

The Art of Debugging: Unveiling the Problem Solving Process

Equally as important as writing brand new fresh code for a new feature that will revolutionize the world, is trying to figure out why last weeks revolutionary code isn't working anymore. And odds are, you are going to spend just as much time bug hunting as you are coding if you work in the corporate world.

There's no "one trick to rule them all" when it comes to debugging code, because the number of potential causes and factors are pretty much limitless. Maybe your co-worker checked in the wrong files without realizing it, or maybe there was an update to a package that broke the entire build.

Or maybe there's a script that's suppose to run daily at 9am everyday to update records and it hasn't run in over a month.

Things happen in the world, and as developers, we have to figure out why they are happening and how we can patch them up for now, until we can get around to actually fixing them at some point later in time. Until we forget.

The following are my personal guidelines when it comes to debugging software mainly gathered from working in the corporate world for over 15 years. Most of these I still follow on a daily basis.

Reproducing the Issue

Recreating crime scenes isn't tjust a tactic for detectives and Batman. You can't fix something if you don't know exactly where it's broken. But unfortunately, software bugs can be notoriously hard to find, especially if you are working on an older codebase that hasn't been documented since Cobol was a popular language.

And to make matters worse, most bugs are reported by someone who isn't even technical and who can't quite explain just what went wrong. So developers have to learn to go exploring, and this, can take a fair amount of time and energy.

Your natural inclination 5-minutes into trying to reproduce any bug is to give up and assume that the person who reported it did something wrong.

"Works on my machine"

And then you close the ticket for it to appear again in the very near future. From my experience, this typically means that there is some specific edge-case condition that doesn't apply to you, but that applies to someone else.

That could either be because they have their accounts setup differently, or because your developer account is so filled with test data that your machine is the edge case anomaly. At least that's how my test accounts are.

In any case, you at least want to get as close as you can to the configuration of the person reporting the issue. If you've managed to do so, and still can't replicate the error, then you need to go to the source and have the person reporting it replicate it.

Ideally, this would be the first thing to try, as it might save you more time, but often times people reporting bugs are either your site visitors, which might report it and then leave forever, or some internal person at your company which could be busy or in a completely different location altogether.

Assuming that you eventually figure it out and are able to replicate the issue, you can finally start to figure out what exactly is happening.

And if you just can't get past the replication phase, then you won't be able to fix it.

Isolating Variables

Once you manage to reproduce the issue you'll either immediately realize what the problem is, or you'll nod and stare at the screen for a few minutes because whatever you're seeing "is not possible".

Assuming the latter, you'll need to nail down where the issue is happening. You'll need to know which component, class, function or variable is causing the problem. And this can be time-consuming and error prone if you are working on a large enough codebase that many different developers have worked on throughout the years.

Here are a few ways to tackle variable isolation:

Comment out everything - Quite possibly the fastest way to nail down where the issue is happening is by commenting out large chunks of code until the error goes away. And then adding back the code selectively until you break it again. This won't work well if you have a large dependency chain where every method depends on some other method to work correctly.

Use debugging tools - If you're using a compiled language, such as C#, then you shouldn't have too much trouble setting up a few breakpoints and following the data along until you encounter the issue. This can be a bit time-consuming, and again, depending on nesting and references might not be the most affective, but worthy of a shot.

If you are using something like JavaScript, you can still setup breakpoints in the developer console on pretty much every browser. However, if you are using multiple 3rd party libraries, minifiers and/or bundlers, you might not be able to make out anything useful.

Log everything - Sometimes the simple console.log is all you need to find your answer. But not without littering your codebase with commented out lines of console statements.

But often times, this is a necessary step, so log accordingly. And if you are worried about littering the codebase with extra lines of code that aren't needed, you can read about how to remove console logs from production over here.

Simplify the logic - And lastly, but probably the most difficult, sometimes if the code is causing too many problems, it's because it just isn't good code. It might be hard to understand and have little to no comments and what you think is suppose to happen, isn't at all. So you might need to simplify the whole thing down. You might need to double check variables, create more helper functions and refactor the code in order to see exactly what's happening. Not the most common scenario, unless you find yourself working on a very old codebase.

And if you still can't quite find the culprit, then read on.

Compare Working and Non-Working Scenarios

One of the first things that I do whenever a bug comes through my channels, is to test out that feature with regular data that I expect every user to enter, to ensure that it just isn't broken for everyone.

Most of the time, everything works as it should. But every so often, the error pops up immediately.

Assuming that everything works as intended, I will then attempt it with the exact data that caused the issue. This might not immediately tell me what the problem is, but it does begin to give me an idea where to look.

If the culprit data works just fine, then that tells me that the issue is somewhere else.

Though often times the culprit could be malformed data that isn't being validated correctly. You might be expecting a numeric value only to be presented with a "N/A" instead.

Comparing both functional and non-functional data at least points you in the direction.

Question your Assumptions

If you're not getting anywhere with your tests and are about to call it quits and just assume that the other person is the problem, then maybe it's time to question your assumptions. You might think that you know what a particular function/feature/method does, but potentially you could be missing a few pieces of information.

I've spent hours looking at broken code trying to figure out why a specific result kept appearing over and over, only to realize that the core logic was being calculated somewhere else completely and that it was essentially hidden behind a black-box.

This could happen because the code was refactored at some point, or perhaps you were using a 3rd party library that has seen some changes. But don't assume that you know exactly how the code works and where it is located. Sometimes you have to take a detour in your thinking.

Rubber Duck Debugging

Rubber Duck Debugging is essentially taking an everyday object, like a rubber duck, and then breaking down the problem to it as if it were listening to you. The idea being that as you explain the issues at hand you can get a better understanding yourself of what's going on.

You don't technically have to use a rubber duck as really anything that you're willing to talk to can make for a great stand in. But really, the whole point is to get you to verbally walk through the process with yourself.

I don't personally use this approach and mainly list it here because it is a relatively well-known tactic. If no other developer is around, I really just tend to mutter to myself as I navigate codebases.

However if you do have another developer nearby.

Pair Debugging

If you're lucky enough to have another human developer around with some free time, then toss that duck aside and do just that. Particularly if you're a younger developer and you have more senior developers within arms reach.

When I was first starting out as a programmer, I would constantly spend way too much trying to free-solo a bug on my machine only to fall short and have to ask for help from a lead developer.

And on almost every occasion, their expertise and knowledge showed almost immediately as they asked the right questions and navigated the issue flawlessly within a few minutes.

But even if you don't have a veteran developer around, just discussing it with someone (a human) can yield immediate results. They might be seeing the problem from a slightly different angle than you are, or vice versa. And combined, that knowledge brings forth solutions. And it's a lot funner, than talking to a duck.

Consult Documentation and Forums

If you are using any kind of 3rd party library, odds are that you read through the docs when you first configured and set it up and that you haven't looked at it since.

Well, sometimes, these documents change. And not because they've gotten new features. Sometimes they change because as it turns out they were written wrong in the first place.

This happens all the time and often gets swept under the rug as the potential issues that they could cause are really on the edge-case and most people will never notice. Until it hits your codebase that is.

If you suspect that your issues are with a library and not your specific code, then re-read the documentation.

Take a Break

If all else fails and you've been at it for hours, then odds are you spent whatever remaining energy you had and now you're just staring at the screen hoping that bedtime happens. We've all been there. And unless you're facing a site-crashing issue that needs to resolved ASAP, then usually stepping away and taking the day to ponder your life choices is just fine.

That bug is going to be waiting for you right when you get back. Along with several new ones more than likely.

Walt is a computer scientist, software engineer, startup founder and previous mentor for a coding bootcamp. He has been creating software for the past 20 years.

Last updated on: January 07 2024