RE: YOU are responsible, even when YOU are not to blame

Subject: RE: YOU are responsible, even when YOU are not to blame
From: "Miller, Alan" <Alan -dot- Miller -at- prometric -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Fri, 11 Apr 2003 10:09:08 -0400


Andrew, Andrew, Andrew.

You were doing so well. Then you stepped on your .... :-{)

Root Cause Analysis (note the capitalization) is an essential component of any failure analysis. Let's say your business uses several expensive pieces of equipment and your profits are tied to their reliable, safe, efficient, and economical operation. With me so far? Good. Now suppose one of those pieces of equipment, otherwise identical to the others, continues to experience malfunctions that result in its shutting down at inopportune times. These failures are expensive in terms of lost revenue and repair costs. Being the good engineer that you are, you initiate a failure analysis. You know what part failed because you are holding it in your hand. But why did it fail? Was it a defect in the part? Was the equipment being operated outside its safe envelope of operating conditions? (That's a polite way of saying "abused.") Has the equipment been maintained according to specifications? Did the folks who operate and maintain the equipment know what they were doing? Is it part of a larger pattern? Will the new part fail as well? If so, when? Can I/how do I keep this from happening again?

The object of a Root Cause Analysis is NOT to find blame or point fingers. It is to find the ultimate cause of a failure and correct it, thus preventing it from recurring (one hopes). Admiral Rickover ("Ricky Reactor" to those of us that knew and worked for him) felt that *all* failures could ultimately be traced back to a human cause. While that may be true philosophically, it is not always productive or economical. I our little example, the failed part may have been poorly designed or manufactured. While we cannot expect the manufacturer to change the part to meet our needs, we may be able to find a supplier who provides parts from another manufacturer. If the failure were the result of a human error on our part--say, the maintenance guy didn't use the correct lubricant--we may be able to correct it. In this case by supplying the correct lubricant and training the maintenance crew in its use.

I agree that a single case does not necessarily point to the ultimate cause. But as part of the analysis, one always looks beyond the instance at hand for evidence of a pattern.

You may skip the following (somewhat lengthy) example if you wish.

<Tech writer example> I was assisting in a failure analysis for a DOE facility in a Southern State. One of their steam turbines that drives an electric generator ("dynamo" to our cousins across the pond) had failed--catastrophically. The physical cause was easily determined to be a sudden loss of lubricating oil to the turbine's bearings. No leaks or other damage was found in the oil piping system and the forced lubrication pumps were operating perfectly up to the failure. Further investigation found that the oil supply shutoff valve from that turbine to the oil purifier had been opened, but the return valve from the purifier was closed. I found that one of the operators had been directed to purify the oil in the sump for one of the idle turbines. He had followed his written procedure verbatum, as he was required to. Unfortunately, there was an error in the procedure that opened the valve taking oil from the operating turbine instead of the idle one. This pumped all the oil out of the operating turbine. Time for the tax-payers to buy a new steam turbine. Obvious solution: fix the procedure. But does that *really* fix the problem? What about the poor schmoe who was just doing what he was supposed to? One of the questions I asked him was, did he notice anything odd when lining up the system, like did one pipe suddenly get very much warmer than the others (lubricating oil system piping usually is not insulated, so leaks are more easily spotted during inspections)? Well, now that you mention it, yeah. Did he know why? Uh, no. Did he know it was a problem? Unh-uh. Further analysis of other failures and accidents at the same facility revealed a pattern. The operators were very good at following the procedures, but didn't understand why, or what to do if things went wrong. Conclusion: they needed better procedures and better training for the operators and maintenance crew.</Tech writer example>

One last thought on this, it may well turn out that it is more economical to just keep fixing the failure than fix the root cause.

Al Miller
"Chief Documentation Curmudgeon"
Prometric, a part of The Thomson Corporation
Baltimore, MD
www.prometric.com

The early bird gets the worm, but the second mouse gets the cheese.
-- Author Unknown

-----Original Message-----
From: Andrew Plato [mailto:gilliankitty -at- yahoo -dot- com]
Sent: Fri 4/11/2003 1:36 AM
To: TECHWR-L
Cc:
Subject: Re: YOU are responsible, even when YOU are not to blame


<snip>
This is another one of those business fallacies. That if you find the "root
cause" of errors, you can prevent future errors. This kind of thinking tries to
ignore two salient laws:

1. Errors are natural and normal part of any process
2. Humans are prone to inconsistent results

Finding root causes can be an extreme waste of time in high tech and scientific
environments, unless there are scientific and objective analysis of the larger
picture. You cannot have the people making mistakes also be responsible for
locating their cause. Naturally, they are going to see something other than
themselves as the cause. This is why, in technical documentation, a writer must
be edited by external sources.

Furthermore, finding "root causes" is dangerously close to "locating a
scapegoat." That is, there is a huge difference between analyzing a business
and how it works vs. seeking out people to blame. Analysis is fundamentally a
"pattern recognition" task. In other words, you don't look at a single error
and try to find the cause, you look at a general pattern of behavior or output
and then attempt to analyze the overall quality/capability of that. A writer
that consistently repeats the same errors, is obviously not learning from those
mistakes. If the mistakes are minor, and easy to catch, then it can be seen as
a mere inconvenience. If a writer keeps missing the concepts and consistently
producing poor documentation, then clearly there is a larger issue at stake.

Conversely, taking a single error and attempting to track down its root cause
is most often a misleading endeavor. Drawing conclusions from a single instance
is faulty reasoning. If I spell a word wrong once, it doesn't mean I can't
spell. But if I spell 1000 words out of 1500 wrong, that's a pattern.

This is ultimately exactly the scenario another poster presented earlier. A
fictitious conversation was used as an example where the engineer or manager
was berating the writer for a single error, as if the writer was incompetent.
Again, faulty reasoning. A single omission does not constitute a pattern of
incompetence.

Lastly, "detailed analysis" of processes is not always an efficient use of
time, especially if the process will be chucked and rebuilt for the next
project. Many high tech environments use disposable project methods. That is to
say, they are thrown out at the end of the project and rebuilt for the next
iteration. There is a certain Zen in doing that. It results in constant renewal
within a group. People rarely like to do the same thing over and over again.
And since the success of any process is inherently tied to the people using it,
it can be beneficial to tear down a project's infrastructure at the end and
allow it to be rebuild by the people that must carry it forward.

Andrew Plato


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Purchase RoboHelp X3 in April and receive a $100 mail-in
rebate, plus FREE RoboScreenCapture and WebHelp Merge Module.
Order here: http://www.ehelp.com/products/robohelp/


Help celebrate TECHWR-L's 10th Anniversary starting this month!
Check out the contests at http://www.raycomm.com/techwhirl/special/contests/
Happy birthday to you, happy birthday to you, happy birthday TECHWR-L....

---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.



Previous by Author: RE: Bold text in man pages
Next by Author: RE: Hostility towards STC
Previous by Thread: Re: YOU are responsible, even when YOU are not to blame
Next by Thread: RE: YOU are responsible, even when YOU are not to blame


What this post helpful? Share it with friends and colleagues:


Sponsored Ads