What’s wrong in my thinking about Errors?

After my previous post about why we accept human errors but are harsher on machines, two longtime readers and pillars of the resilience engineering community reached out to point out the error of my ways.

Courtney Nash of the Resilience in Software Foundation wrote a long response, pointing to errors in my thinking and framing. John Allspaw, the former CTO of Etsy who also worked at Flickr in the Yahoo years and at Friendster, made the same point in an email to me.

Your premise isn’t an accurate one and the research you’re citing to support your argument actually undermines it.

I have yet to talk with John, but Nash’s response has me revising my thinking and reading more before I revisit the conversation. In my own defense, I should have started with the caveat that I lack expertise in some of the topics I was wading into. If you read a book, sometimes you don’t become an expert. You just come to your own conclusions. I suppose that is the case here.

If I understand correctly, or at least from what I take away from Nash’s response, the argument is that we don’t actually forgive human errors. We use “human error” as a way to close investigation. The “operator error” finding is usually scapegoating that obscures systemic issues. Having re-read my own piece, I was trying to make the same point. It got lost in translation. All I was trying to say is that we blame both types of errors differently.

But even that framing may not go far enough. For the resilience engineering school, “human error” as a category is itself the problem. It is a stop-sign meant to end inquiry, not a different kind of blame. I am still working through what that means for my own argument.

Either way, this is a chance to live and learn and dig deeper. As I do, my original premise still feels real and relevant. It was a pragmatic and philosophical argument, not a technical one, about two aspects of forgiveness. It reflects the duality in how we feel about proverbial men and machines, something I am grappling with as we move from analog to analog-and-digital selves now being augmented by the unreal reality of artificial intelligence. I am sure I am going to step in it, time and again.

Please read Courtney’s piece. When I do speak with Allspaw, I will add that update too.

April 30, 2026. San Francisco

2 thoughts on this post

  1. Om, it’s interesting that you frame “human error” as an end point. In my industrial work in mining and investigating incidents “human error” wasn’t an acceptable end point if the question “why did the human make the error” had an answer. E.g. “the equipment didn’t have guarding” but that goes deeper and deeper as you might expect. But for us to determine “worker injured because they put their arm in the machine” was utterly unacceptable as an investigation outcome. Even to say, “put arm in due to lack of guarding” wasn’t deep enough. Was it missing? Damaged? Process too difficult with guarding in place so it gets removed? Unidentified interaction/hazard point? Guarding policies not supported by funding to adapt equipment and process?
    I mean this is just basic root cause analysis. Why should these new situations be held to such a lower standard that suddenly “human error” is acceptable as a root cause?

  2. Hi, Courtney here! Thank you for such a thoughtful reply, I’m glad to hear you are re-thinking the concept of human error and I also appreciate hearing where you were coming from when you originally wrote that piece (much of which I can relate to). The offer is very much open to chat should you want after you’ve read more and chewed on this further. (I also fully agree with what Michael said in his comment about the mining industry. One thing Allspaw and I and many other folks in the Resilience Engineering community are actively doing is learning from other industries that have been tackling these ideas for decades now so we can most effectively bring them into the software industry.)

Leave a Reply to Michael Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.