Human error and the problem of causality in analysis of accidents

Here Jens Rasmussen starts by giving some background on some of the ways in which "human error" has become an acceptable answer to the question 'What went wrong?' and then explains why its not a very good or useful answer.

He explains that previously, industrial systems were fairly easy to model accurately by various groups of professionals.  This meant that hazards could be proven empirically by pointing to some bit of math or engineering.

In 1990, he was pointing out that we're no longer in that realm:

The hazards involved in operation of large-scale systems lead to reliability and safety requires that cannot be proven empirically.

He shows that even the concept of what an event is can be difficult to pin down.  If we include a lot of detail, then that "event" is less likely to occur again, as future events are unlikely to have all the exact same parts.

He goes on to say that typically we investigate accidents, by working backwards through a sort of causal tree.  This thing happened because of the previous, continuing to work backward and branch.

Explaining accidents through cause will actually change how you analyze the accident in question.  Similar to what we spoke of last week Rasmussen explains that typically one will examine the series of changes and will stop when something is found that matches what we expect to find.

What we use as the "stop-rule" is very rarely well defined and is usually different for each person and their purpose in looking at the accident.

He explains that how much we break down the details in an accident entirely depends on who we expect to read the analysis and what the analyst themselves expects to find.  He uses the example of the statement "The short-circuit caused the fire in the house" to show how varied the explanations and analysis of this could be.

If you know enough about electrical systems to wonder why there wasn't a breaker, then this answer isn't really useful or acceptable to you.  If you know a lot about building materials and wonder what materials were flammable, then you'd probably ask different questions and seek different information.

He also points out that analysis of accidents can be made for different audiences and for different reasons, for example to:

- Place blame/responsibility

- Explain the accident

- Improve the system

If you've ever done a postmortem because policy said you had to, or because someone demanded answers, this is probably very familiar to you.

So this can further influence what your "stop-rule" is.  If you're looking to place blame, you likely stop when you find a human doing something that you don't expect.  If you're looking to improve the system, you likely stop when you find something you know there is a "cure" for (and perhaps even call that root cause).

Further, he explains, we can't really define human error very well.  Especially in the face of resource limitations that exist in unexpected situations and the way humans learn and adapt.  Or what about times when we use the "usual cues" when working, but there is some change or fault in the system. Now our "usual cues" didn't work.

A better lens, he provides, is to realize that "it can be expected that no more information will be used that is necessary for discrimination among the perceived the particular situation".  So when we're outside of those previously experience situations, and make a choice, we're testing a sort of hypothesis. If it turns out negatively, then we typically label that "error".

the smoothness and speed characterizing high professional skill, together with a large repertoire of heuristic know-how rules, will evolve through an adaptation process in which 'errors' are unavoidable side effects of the exploration of the boundaries of the envelope of acceptable performance.

So often, we'll then hear a solution that is probably something like: Well, lets just make those boundaries as far away as we possibly can.  Then things will be "safe," right?

Rasmussen is already prepared for this.  He points out that: "it appears to be essential that actors maintain 'contact' with hazards in such a way that they will be familiar with the boundary to loss of control and will learn to recover".  So if we make the boundaries really far away, it'll be hard to sense where those boundaries are. And then when they are really far away, crossing them is more likely to be much more catastrophic and permanent.

If this seems strange, think about driving and how you learned to drive and improved your skill.  You likely got time to experiment, in ways and situations that were fixable. Probably a parking lot, where you learned how much you could push the gas, how hard you should push the brake, how sharply you could turn and still retain control.  You were exploring the boundaries.

Later on, you got better, and were able to perceive more things, and take in more inputs from the road.  Its condition, weather effects, other drivers and were able to fine tune your behavior. As this happened it's likely that you had some trouble, maybe hit a parking stop, lost traction, had a close call, maybe something more serious.  

But this helped you learn.  Rasmussen explains this too:

Some errors, therefore, have a function in maintaining a skill at its proper level, and they cannot be considered a separate category of events in a causal chain because they are integral parts of a feed-back loop.

He closes by suggesting that in light of these things, we'll likely have to reconsider the things that we attribute to operator error and think about things on another level beyond just making humans more reliable.

A lot of the software industry, isn't great at this, falling back to that mode of thinking we talked about previously, "if only we could get rid of those pesky, unreliable, humans."

This paper is also interesting because it has a discussion section at the end from various members of The Royal Society that published it.  Rasmussen then engages in a discussion with people from a hospital and a university discussing manufacturing, risk assessment, and patient care.


Want to receive more content like this in your inbox?