Blog

Overcoming the Hidden Assumptions of Root Cause Analysis

(Photo by Lars Ploughmann, Flikr; License)

In these strange days, when facts seem to matter less, I thought the pediment above the door of London’s Kirkaldy Testing and Experimenting Works from 1874 was rather good. Of course, with root cause analysis (RCA), we are trying to use all the facts available to get to root cause and not rely on lots of guesswork and opinion. In my last post I described a method of RCA that I called DIGR® and I explained why I think it is more effective than the oft-taught “Five Whys” method. As a reminder the steps to DIGR® are:

Define

Is – Is Not

Go Step By Step

Root Cause

When you decide you are going to carry out an RCA there are a number of hidden assumptions that you make. Being aware of these might mean you don’t fall into a trap. In the comments to my previous posts, people have mentioned some of these already and I wanted to explore five of them a little further.

1.Assuming that the effects you see are all due to the same root cause. In the example I have been using in this blog where expired vaccine was administered to several patients at two different sites, we carried out an RCA using DIGR*. In doing so, we assumed that the root cause of the different incidents was the same and the evidence we gathered in the DIGR® process seems to confirm that. But it is possible that these independent incidents have no common root cause – the issue occurred for different reasons at each site. As you review the evidence in the Is-Is Not and Root Cause parts of DIGR® it is worth remembering that the effects might be from different root causes. This is likely to show up when the analysis seems to be getting stuck and facts seem to be at odds with each other.

2. Assuming there is only one root cause. Often issues happen because of more than one root cause or ‘causal factor’. Sometimes there is benefit in focusing on just one of these but other times, there may be a benefit in considering more than one. In our example, we came to the conclusion that the root cause was that ‘the process of identifying expired batches and quarantining them has not been verified’. This is something we can tackle with actions and try to stop a recurrence of the issue. But we could have gone down the path of trying to understand why the checks in the process had failed on these occasions and tried to get to root cause on those. We would have started looking at Human Factors which I will cover in a subsequent post. You have to make a judgement on how many strands of the issue you want to focus your efforts on. In our example we have assumed that by focusing on the primary process, the pharmacists and nurses will not have expired vaccine and so their check (whilst still a good one) should never show up expired vaccine.

3. Assuming you have enough information to work out the cause and effect relationships. Frustrating though it is, it is not always possible to get to root cause with the facts you have available. You always want to use facts (evidence) to check whether your root cause is sound and if you’re really in the guessing mode. If there is no further information available you might have to put additional QC checks in place until you obtain more facts. In our example, if we carried out a RCA using DIGR® straight after the first issue occurred, we might have focused on the root cause being at that particular site on the basis it had not happened at any others (the Is-Is Not part of DIGR®). But we might simply not know enough about exactly what happened at that one site. Of course, following further cases at another site, we realised that there was a more fundamental, systemic issue.

4. Assuming all facts presented are true. I’ve mentioned Edward Hodnett’s book from 1955 “The Art of Problem Solving” previously. There is a chapter on ‘facts’ and in it he says: “Be sceptical of assertions of fact that start, ‘J. Irving Allerdyce, the tax expert, says…’ There are at least ten ways in which these facts may not be valid. (1) Allerdyce may not have made the statement at all. (2) He may have made an error. (3) He may be misquoted. (4) He may have been quoted only in part.” Hodnett goes on to list another six possible reasons the facts might not be valid. This is not to say you should disbelieve people – but rather that you should be sceptical. Asking follow up questions such as “how do you know that?” and “do we have evidence for that?” help avoid erroneous facts setting you off in the wrong direction on your search for root cause.

5. Assuming that because an issue appears to be the same as another issue, the root cause is the same. One of the challenges with carrying out a good RCA is the lack of time. When we are pressurized to get results now, we focus on containing the issue and getting to root cause comes lower down in the priorities. After all, if we get to root cause and put fixes in place, we will help the organization in the future but it doesn’t help us now. As RCA is often a low priority, it is also rushed. And to quote Tim Lister from Tom DeMarco’s book Slack, “people under time pressure don’t think faster.” One way of short-cutting thinking is to use a cognitive short-cut and just assume that the root cause must be the same as a similar issue you saw years ago. If you go down that route you really need to test the root cause against the available facts to make sure it stands up in this case too. Deliberate use of the DIGR® method of RCA can help combat this cognitive bias as it takes you logically through the steps of Define, Is-Is Not, Go step by step and Root Cause. People need time to think.

DIGR® can help with the focus on facts rather than opinion in RCA. It helps pull together all the available facts rather than leaving some to the side by focusing on ‘why’ too early.

In my next post I will go into some more detail on the G of DIGR®. How using process maps can really help everyone involved to Go step by step and start to see where a process might fail.

 

Text © 2017 Dorricott MPI Ltd. All rights reserved.

DIGR® is a registered trademark of Dorricott MPI Ltd.

Use DIGR to get to the Root Cause!

(Photo: Martin Pettitt, License)

I want to thank everyone who read, commented or liked my last post – “Root Cause Analysis: we have to do better than Five Whys”. Many seemed to agree that the Five Whys approach is really not up to the job. The defense of Five Whys seemed to fall into a number of buckets – “It’s just a tool”, “It’s a philosophy, not a tool”, “It needs someone who is trained to use it”, “It’s not meant to be literal: it’s not only about whys”, “It’s not meant to be literal: five isn’t a magic number”. No-one tried defending the Lincoln Memorial example which is so often used to teach Five Whys. I really do think it is a poor tool on its own – at the very least, it is mis-named. I think we do people a mis-service by suggesting “just ask why five times” – we over-simplify and mislead. I think there is a better way. One that is still simple but, importantly, doesn’t miss out key information to help get to root cause and is more likely to lead to consistent results. This is why I came up with the DIGR® method. At the end of this post I explain the basis for DIGR®. There are many sophisticated RCA methods and they have their place but I do think we’d do well to replace Five Whys with DIGR®:

  • Define the problem. You need to make sure everyone is focused on the same issue for the RCA. This sounds trivial but is an important step. What is the problem you are focusing on? You would be surprised how often this simple question brings up a discussion.
  • Is – Is Not. Consider Is – Is Not from the perspective of Where, When and How Many. Where is the issue and where is it not? How many are affected and how many not? When did the problem start or has it always been there?
  • Go step-by-step. Go step-by-step through the process. What should happen – is it defined? Was the process followed? Were Quality Control (QC) steps implemented and does data from them tell you anything? If an escalation occurred earlier was the issue dealt with appropriately? This is where a process map would help.
  • Root cause. Use the information gathered to generate possible root causes. Then use why questions until you get to the right level of cause – you need to get back far enough in the cause-effect process that you can implement actions to address the cause but not to go back too far. This is where experience becomes invaluable. Narrow down to one or two root causes – ideally with evidence to back them up.

Of course, once you have your root cause you will want to develop actions to address the root cause and to monitor the situation. I will talk more about these in future posts. For now, I want to use an example with the DIGR® method of RCA.

Consider a hypothetical situation where you are the Clinical Trial Lead on a vaccine study. Information is emerging that a number of the injections of trial vaccine have actually been administered after the expiry date of the vials. This has happened at several sites. The first thing you should do is contain the problem. You do not need DIGR® for this. When you have chance to carry out the RCA, what might the DIGR® approach look like?

Define. Let’s make sure everyone agrees on what the problem is. It’s not that a nurse didn’t notice that a vial that was about to be administered was past its expiry date. Rather it is that expired vaccine has been administered to multiple patients at multiple sites.

Is – Is Not (Where and When). Where is the issue? It has happened in two sites in two regions (North America and Western Europe). In one site, it has happened twice and this is where the problem was discovered by the CRA reviewing documentation. Is there anything different about the sites where it happened versus those where it did not? There is only one batch that has actually passed the expiry date and not all sites received that batch. So there are many sites where this problem could not have occurred (yet). In fact in reviewing the data we see that for the sites with the expired batch, there have only been 30 administrations of the vaccine since the expiry date. So there was the potential for 30 cases and we have three at two sites. 27 other administrations were of unexpired vaccine.

Go step-by-step. What should actually happen? Each batch has the same expiry date. The drug management system determines which vials are sent to which site based on the recruitment rate. The system flags when there are vials that are expiring soon at particular sites and sends an email. The email explains the action needed – to quarantine expired vials by placing them away from the non-expired ones and being clearly labelled. These are then collected to be destroyed centrally. So this process must have failed somewhere. Further investigation highlights that the the two sites did not receive the email. In fact, email addresses used to send the notification to the sites have minor errors in them – indeed not just the two sites where the issue occurred but in another three. At the two sites with the issue, the emails did not arrive and so they were not informed of expired vaccine and did not specifically go in to quarantine them. There are also no checks in place to make sure the process works – test emails, check for bounced emails, copy to CRA to follow up with site etc.

Root cause. Based on all the information brought together in this RCA, it seems that this was an issue waiting to happen. One route of enquiry is why the two sites did not check the expiry date prior to administration. This could go down the route of blame which is unlikely to lead to root cause (as I will discuss in a future post). But a more fundamental question is how the nurses at these sites were given expired vaccine in the first place. We were lucky in 27 cases – presumably good practice at sites stopped the issue from occurring. But we don’t want to rely on luck. Why did the nurses and pharmacists have expired drug available to use? Because the process of identifying expired batches and quarantining them has not been verified. I would argue this is the root cause. You could go further to trying to understand how the erroneous email addresses were entered into the drug management system but the level we have got to means we can take action – it is within our control to stop this recurring. In other words, we are at the right level to develop countermeasures.

In my next post I will expose some of the hidden assumptions of RCA.

I hope you are intrigued by the DIGR® method of root cause analysis. Could we replace Five Whys with DIGR®? Of course, I welcome your thoughts, comments and challenges to the approach!


Some background to DIGR®

Some people seem naturally good at seeking out root cause. And when you try to formulate the method it is not easy. In DIGR® I have brought together various approaches. Define comes from the D in DMAIC as part of Six Sigma. It is also part of A3 methodology. Is – Is Not comes from the approach described by Kepner and Tregoe in “The New Rational Manager”. Go Step-by-Step comes from Lean Sigma’s process and systems approach – to quote W. Edwards Deming, “If you can’t describe what you’re doing as a process, you don’t know what you’re doing”. Root Cause is, in part, the Five Whys approach – but only used after gathering critical information from the other parts of DIGR® and without a need for five. To look at DIGR® from the approach of 5WH: D=Who and What, I=When and Where, G=How, R=Why.

 

Text © 2017 Dorricott MPI Ltd. All rights reserved.

DIGR® is a registered trademark of Dorricott MPI Ltd.

Root Cause Analysis – We have to do better than Five Whys!

(Photo: Ad Meskens)

If you’ve ever had training on root cause analysis (RCA) you will almost certainly have learnt about Five Whys. Keep asking ‘why’ five times until you get to the root cause. The most famous example is of the Lincoln Memorial in Washington. The summary of this Five Whys example is reproduced below from an article by Joel A Gross:

Problem: The Lincoln Memorial in Washington D.C. is deteriorating.

Why #1 – Why is the monument deteriorating?  Because harsh chemicals are frequently used to clean the monument.

Why #2 – Why are harsh chemicals needed? To clean off the large number of bird droppings on the monument.

Why #3 – Why are there a large number of bird droppings on the monument? Because the large population of spiders in and around the monument are a food source to the local birds

Why #4 – Why is there a large population of spiders in and around the monument? Because vast swarms of insects, on which the spiders feed, are drawn to the monument at dusk.

Why #5 – Why are swarms of insects drawn to the monument at dusk? Because the lighting of the monument in the evening attracts the local insects.

Solution:  Change how the monument is illuminated in the evening to prevent attraction of swarming insects

This example is easy to understand and seems to demonstrate the benefit of the approach of Five Whys. Five Whys is simple but suffers from at least two significant flaws – i) it is not repeatable and ii) it does not use all available information.

Different people will answer the why questions differently and their responses will take them to a different conclusion. For example to Why #2 “Why are harsh chemicals needed?”, the response might be “Because the bird droppings are difficult to remove with just soap and water”. This leads to Why #3 of “Why are bird droppings difficult to remove with just soap and water?” and you can see that the conclusion (“root cause”) will end up being very different. The approach is very dependent on the individuals involved and is not repeatable.

Other questions that would be really beneficial to ask but would not be asked using a Five Why approach are:

    • When did the problem start? Armed with the answer to this might have helped link the timing with when the lighting timing was changed.
    • How many other monuments have this problem? If other monuments do not have this problem then what is different? If other monuments have this problem then what is the same? This line of questioning is, again, more likely to get to the lighting timing quickly and reliably because a monument without lighting and without the problem suggests the lighting might have something to do with the cause.

In my last post I described a hypothetical situation of a vaccine trial where subjects had received expired vaccine. If we use the Five Whys approach, it might go something like:

Why did subjects receive expired vaccine? Because an expired batch was administered at several sites; Why was an expired batch administered at several sites? Because the pharmacists didn’t check the expiry date; Why didn’t the pharmacists check the expiry date? Here we get stuck because we don’t know. So maybe we could try again.

Why did subjects receive expired vaccine? Because an expired batch was administered at several sites; Why was an expired batch administered at several sites? Because the expired batch wasn’t quarantined. Why wasn’t the expired batch quarantined? Because sites didn’t carry out their regular check for expired vaccine. Why didn’t sites carry out their regular check for expired vaccine? Because they forget maybe? Or perhaps didn’t have a system in place? As I hope you can see, we really end up in guess work using Five Whys because we are not using all the available information. Information such as which sites had the problem and which didn’t? When did the problems occur? What is the process that ensures expired vaccine is not administered? How did that process fail?

Five Whys can be fitted to the problem once the cause is known but it is not a reliable method on its own to get to root cause. Why is definitely an important question in RCA. But it’s not the only question. To quote the author of ‘The Art of Problem Solving’, Edward Hodnett, “If you don’t ask the right questions, you don’t get the right answers. A question asked in the right way often points to its own answer. Asking questions is the ABC of diagnosis. Only the inquiring mind solves problems.”

Here are more of my blog posts on root cause analysis where I describe a better approach than Five Whys. Got questions or comments? Interested in training options? Contact me.

Note: it is worth reading Gross’s article as it reveals the truth behind this well-known scenario of Lincoln’s Memorial.

 

Text © 2017 Dorricott MPI Ltd. All rights reserved.

DIGR® is a registered trademark of Dorricott MPI Ltd.

Root cause analysis can help you sleep at night

Who cares about root cause analysis (RCA)? Of course, we all do now it’s in the revised GCP, the ICH E6 (R2) Addendum*. But does it really matter? It’s easiest to think through from the perspective of an example. Consider a hypothetical situation where you are the Clinical Trial Lead on a vaccine study. Information is emerging that a number of the injections of trial vaccine have actually been administered after the expiry date of the vials. This has happened at several sites. Some example actions you might take immediately are – review medical condition of the subjects affected, review stability data to try to estimate the risk, ask CRAs to check expiry dates on all vaccine at sites on their next visit, remind all sites of the need to check the expiry date prior to administering the vaccine. These and many other actions are ones your team could quickly generate and implement and they will likely contain the issue for now. It took no root cause analysis to generate these actions. Could you sleep being confident in the knowledge that the problem won’t recur?

Without RCA, you don’t really know why the problem occurred and so you can’t fix it at the source. All you can do is put in additional checks and as these are implemented reactively, they may not be properly thought through and people may be poorly trained (or not trained at all) on the additional checks. We also know that while checks are valuable in a process they are not 100% effective when carried out by people. In this example we can be sure that the pharmacist dispensing and the nurse administering the vaccine have been trained to check the expiry date and yet we still have cases where expired vaccine has been administered. Do we really think that reminding the pharmacist and nurse is going to be enough to fix the problem forever? In a future blog, I will describe a powerful technique for RCA but for now, imagine you had managed to carry out a root cause analysis on this situation.

What you might discover in carrying out a RCA is that there is no defined process for informing sites of expired vaccine and requiring them to quarantine it. Or perhaps that the expiry date is written in American date format but being read in European format (3/8/16 being 8-Mar-2016 or 3-Aug-2016). Whatever the actual root cause, by finding what it is (or they are) you can start to consider options to try to stop recurrence. And with additional checks you could look for early signals in case these actions are not effective. By taking these actions, would you be more likely to sleep at night?

Think you know about RCA? In my next blog I will reveal why the Five Whys method we’re always told to use is not good enough for a complex situation such as this. And later, I will provide a description of a powerful technique for RCA that seems to be seldom used. If you want to hear more, please subscribe to my blog on the left of the screen. All comments welcomed!

And did you notice that I haven’t mentioned CAPA once (until now!)

* Section 5.20.1 Addendum: “If noncompliance that significantly affects or has the potential to significantly affect human subject protection or reliability of trial results is discovered, the sponsor should perform a root cause analysis and implement appropriate corrective and preventive actions.” [my emphasis]