Home
Home-custom
Cancel

Debugging With Rule Outs

One of the skills that I use, teach, and consider fundamental to being a good engineer is the idea of a “rule out.”

The idea is basically that, when debugging, you reverse the question from “what could be happening” to “how do I demonstrate that this set of things is not happening.” Not asking “what will prove this theory correct” but rather “what will rule out these other possibilities?”

The term comes from medicine, where it is a critical component of differential diagnosis.

A Definition With Wordle

By now everyone has at least a passing familiarity with Wordle. It’s a game where you try to guess a word in six tries, with it letting you know, at each guess:

  • Which letters you get in the correct position
  • Which letters appear in the word but are in the incorrect position
  • Which letters do not appear in the word

In a real way, each guess you take with wordle is a theory where you are trying to figure out what words it can’t be. From the start, you have no information with which to make a guess, with each guess you gain more information, so the best strategy is often to focus on eliminating as many words as possible.

For the first few guesses especially, you are more focused on what it can’t be rather than trying to guess what it is.

This is a rule out.

An Example

We had a problem at work that had occupied three developers over two days. The compiler was hanging and they couldn’t figure out why.

It took me thirty minutes to solve it and come up with a solution.

Not because I am all that, but because I knew the system well enough to start doing rule outs immediately.

Everyone in this group was convinced up one way and down the other that it had something to do with Guice—a library we were using. They couldn’t figure out how, but they knew that this was the problem as the element in all of this that they had no experience with.

I knew from the symptom—the compiler hanging—that it almost certainly couldn’t be Guice. Guice doesn’t affect the compiler and is largely downstream of the compiler. I devised a test to double check this case and hit the compiler hang without the guice annotations.

Great.

What else could it be?

It's entering a task,
so unlikely that
gradle proper is the cause
Is it failing during
execution or compilation?
It seems to be during a
`compileJava` step,
so not a tool or a test
Happening during some sort of
`compileJava` step
A bug in javac is possible,
but we aren't doing anything
too weird, so unlikely
If it is a library we know
it has to be something
that changes or analyzes bytecode
We can pull the test compile
step out entirely and
it still happens, so not JaCoCo
Removing the static analyzer causes
it to pass, we've found our problem
Now we can drill into specifics,
which one is causing it?
Build Hanging
XOR
Gradle
XOR
Execution
Compilation
XOR
Java
Library
XOR
JaCoCo
Static Analyzer
Specific Checks

I know that the Java compiler has quirks, but is usually pretty solid and doesn’t randomly hang. So what is changing the compiler?

I run the compiler and I note that we aren’t getting to the test phase, so it isn’t jacoco—a library that is frequently responsible for this sort of thing because it munges bytecode.

We also have a static analysis tool running that runs its own version of the compiler as part of the compiler step?

I disable the static analysis tool. Rerun it. No hanging. Culprit found. A little more diagnosis and I find the specific rule and fire a commit to disable it then file a bug with the team that handles the static analysis tool. I quickly do differential diagnosis and also determine how we can circumvent the problem in the meantime (it has to do with a final check, so we can either disable the final check or just… not trip the final value check).

As the saying goes:

You are paying for the 20 seconds of pushing the button and the 20 years of experience to know which button to push.

The Core Loop

The process is essentially one of making rule outs:

  1. Identify the apparent error.
  2. Determine what it can’t be.
  3. Determine what it might be and start proving each one as Not the Problem, starting with the easiest ones and the ones that will clear the most possibilities.
  4. Goto 1.

In any given situation with software a given error could be an almost (countably) infinite number of things. So being able to quickly prune entire trees and know how to test to rule out significant parts of the trees becomes a critical skill.

Why This Works

We have several biases as humans that work against us for debugging:

  1. We tend to view things as more likely culprits that we are less familiar with and tend to overestimate the likelihood of things based on recent events. This is a form of the availability heuristic.
  2. We tend to, on finding a possibility, try to find answers that confirm our hypothesis and reinforce our beliefs, rather than answers that eliminate the hypothesis. This is a form of confirmation bias.
  3. We tend to look mostly to the tools we know in order to diagnose and fix the problem, which is a form of anchoring bias.

Performing rule outs is a way of breaking these (and other, related) biases. It forces our brain to consider alternatives and, by considering the alternatives, allows us to more quickly eliminate possibilities and narrow down on an answer.

Basically: Before we can accept something as true, we must first/also prove that other possibilities are false.

Another Illustrative Example

A while back I was working with an intern who had been banging their head on a problem for about half a day. They were getting an error when passing a file to a parser. It looked basically like this:

Library
YAML
OpenAPI Parser
JSON Parser
Our Code

They had a error where it wasn’t working. So they jumped to the piece they were least familiar with: the internal library.

They started setting breakpoints inside of the third party library. Started evaluating the source code and trying to read it. They found not just the error coming out of it, but the error inside of the system that was leading to it. They were hypothesizing if the fact that it was a YAML file but the parser seemed to be geared for JSON mattered.

All questions you need to be able to ask. Eventually.

Instead, we start with the error: An illegal argument exception that is caused by some sort of parsing error. It’s coming from a shallow place in the code too.

Then my brain immediately went to the possibilities:

  1. It can be that the parser itself has a bug, but since it’s a pretty solid and stable parser that seems unlikely. We can validate this is the case later by directly passing in the file if all else fails.
  2. It could be a bug in the interface between their tooling and the library, e.g., calling the wrong method, needing a flag to be set, or asking it for an object type it doesn’t know how to work with.
  3. It could be a problem with what they are giving the library. Either because it is not being properly loaded or because it isn’t in a format that the library expects.
Known good file,
so it should parse
We can dig into
this if necessary
What we are
passing in is empty
File comes from
library, known good
Exception
Parser Bug
Interface Bug
Input Bug
File Not
Being Loaded
Bad File

Confirm: They got the file from the library. This is one of their example files. It’s a known-good file. So it isn’t the second half of (3). This also gives us some confidence that it isn’t (1) (Core Loop Step 2).

Okay, so how can I rule out that it is a problem with what’s being passed to the library? (Core Loop Step 3) Let’s load it out and print it as a first step. This will tell us if what is being passed in is what we expect.

Tada. Found the problem.

The contents of the file weren’t being loaded into the object, so an empty file was being passed along. The parser didn’t know what to make of that and couldn’t figure out how to fit it into the object type it was being asked to work with, so it was dying.

Conclusion

This is a difficult skill that takes practice and time, but it has helped me tremendously in my career. It’s a systematic approach to problem solving that I’ve personally found useful and that people who I’ve taught it to also seem to have found useful.

Setting up a Blog

What I wanted:

  1. The ability to typeset mathematics.
  2. The ability to easily add charts and graphs.
  3. Very low headache management tools. I particularly did not want to be writing everything in HTML.
  4. The ability for it to thrive on neglect and still look and behave basically the same.
  5. HTTPS.

In addition I had a few nice-to-haves:

  1. The ability to work mostly in markdown or $\LaTeX$.
  2. The ability to tinker with it and play around.
  3. Relatively scalable so that I wouldn't have to ever go back and redo it because I "outgrew" it in terms of sophistication of the posts or something else.
  4. Very, very low price such that if I forget about it for a while I won't feel like I am wasting money.

I’ve set up (quite a few) Wordpress and a few other equivalent blogs over the years and they are all fine for what they are, but they failed horribly at most of these criteria.

I also do have a github account, so why not try pages out?

Thus Our Journey Begins

So there are a variety of hacks—most of which are quite old—for typesetting equations into github pages.

There have been various attempts at doing this with MathJax or by directing imgs to unsupported URLs, some strategies in that genre almost-sort-of-kind-of-work, but most of which are not what I would describe as “straightforward.” Some would work for a page or two, but became difficult to extend past that. Virtually none were set up for modern versions of MathJax.

So my first pass was to do it in the most complicated way possible: Why not just use latex directly in a prebuilt container on gitpod?

I loaded up the guide from 2019 and the guide for getting started with github pages, built a docker container, and we’re good to go, right?

Timeouts, Compile Errors, and Typos oh my

Turns out that downloading texlive-full requires downloading about two thirds of the internet and that gitpod really doesn’t like prebuilds taking over an hour. So that’s exciting and also difficult to debug, because it doesn’t come out and tell you that like a good process, it just kind of lets you infer it. Especially if you want to download anything after that, things can get pretty exciting.

It also turns out that ruby versions matter for compiling and that there were a variety of path weirdness from me not quite understanding how to execute the ruby code within gitpod as part of a dockerfile. Suffice it to say just doing this:

1
RUN gem install bundler

…did not do what I wanted nor expected.

Gitpod, for its part, was mostly generating errors that looked like this:

1
Prebuild failed for system reasons. Please contact support. Error: headless task failed: exit status 1

With no real tangible logs to speak of to let me know what happened, and things often getting conflated with what looks like the timeout.

My first instinct, of course, is that I need to install ruby, so I give that a shot:

1
RUN brew install ruby

This has two main effects:

  1. It makes the build take longer, further exacerbating the timeout.
  2. It does not, in fact, solve my problem with bundler.

Eventually I give up and go “I’ll find some other way to manage it temporarily” and pull out all of the $\LaTeX$ pieces, since those seem to be obscuring anything else going on. “I’ll just get this set up with jekyll first, and then I’ll circle back to $\LaTeX$ if necessary,” I think.

Now having solved my timeout issue, it’s easier to debug my ruby problems—it can’t find gem—and I find someone else’s solution to that:

1
2
3
4
5
6
7
8
FROM gitpod/workspace-full

RUN echo "rvm_gems_path=/home/gitpod/.rvm" > ~/.rvmrc

USER gitpod

RUN /bin/bash -l -c "gem install jekyll"
RUN /bin/bash -l -c "gem install bundler"

Okay, that works. If that didn’t work I’ve also seen people put the gem commands into a gitpod init, which also seems to work okay.

Now I can go and initialize the repository.

I quickly get it set up and running with the midnight theme, which looks pretty at first glance and we’re off to the races.

Except going the wrong direction

A few things become quickly apparent:

  1. Github pages depends on an old (ancient?) version of jekyll and doesn’t look like it will… ever… get updated.
  2. The list of supported plugins is very short.
  3. Midnight is an awfully… bare bones… theme. It doesn’t really support my use case without significant customization. It only has a default template, and that’s pretty minimalistic. It doesn’t even support the initial generated file types.

Hunting around for this I find a lot of people really like jekyll-spaceship and it does pretty much everything I need… and it is not going to get allowlisted any time soon. I first find this by trying to compile:

1
\vdash \frac{1}{2}

Which it renders for me as $\vdash \frac{1}{2}$ on my local system but on pages just shows up as source code on an unthemed page.

But there is a solution to this: Because jekyll is just a static page generator, we can run it as part of a github action. That way we’ll generate it when pushing to main.

Perfect.

Gang aft agley

Turns out there is a conflict between the github pages plugin and the version of jekyll I want to use. No big deal, I’ll just pull the information out of the system and…

Wait, what do you mean all of my text is now completely without a template?

I try various things to circumvent this problem—including using the remote template plugin and downloading it locally—but it turns out that this is a consequence of two things:

  1. The presence of the plugins jekyll-default-layout and jekyll-optional-front-matter. These were probably done to make life easier for beginners setting up sites, but what it meant in practice was that not all of the (generated! dowloaded!) files actually had any template information associated with them, and in some cases just flat out failed to do anything productive.
  2. Because the template only has default as a base, it doesn’t include… post, or home, or, well, anything.

While (1) is an easy fix (just add the information), there seems to be no way around (2) with this template short of writing everything from the ground up.

But it is enough to get the github action working. This takes a few false starts due to some confusion about branches, but it was actually remarkably straightforward to set up. Good job to the creator of the action and to the github actions team.

The hunt begins for a theme

Okay, so next stop is to find a template that will… actually… work. The theme world is a horrid mess and while pages like Jekyll Themes exist they lack really any sort of search functionality adequate to find what I am looking for.

So I find one that seems to look reasonable, eventually, plug it in and…

…what do you mean there are more files that I need?

So it turns out that while a good chunk of the theme is in the gem, there’s also a good bit in a default project that they have to jump start things.

There is no easy way, with jekyll, to bring down the files that I can find. People are recommending strategies like bringing down everything into a clean detached branch and fixing a bunch of merge conflicts.

I solve this by copying it over and copying the files more-or-less by hand, which isn’t too bad given the new project state.

That works reasonably well, so I do a bit of mucking around with the configuration—adding CDNs or changing which CDN was used, altering a few versions, tweaking the fonts, etc.

Things look good—I can now compile things like $\frac{x:{\mathbf T} \in \Gamma}{\Gamma \vdash x:T}$, so I deploy and call it a day.

First Post

A place for assorted thoughts

I have been running into the challenge recently of wanting to write out various notes and experiences and finding that twitter is the wrong format, Facebook has rapidly become more unusable for writing, medium seemed like the wrong venue for a few different reasons, and I really wanted somewhere I could jot down things.

I’ve run more specialized blogs before that were focused on a wide variety of things over the years, some of which were moderately successful by whatever metric, but really fundamentally I grew bored or something came up.

So that’s what this blog is an answer to. I don’t know who will read this or if anyone will read this, how often I will or will not post, and maybe this is just me screaming into the void, but that’s okay.

Sometimes the void needs screaming into.