Root Cause Analysis

Big Data – Garbage in, garbage out?

Change of plan for this post…I visited the dentist recently. And before the consultation, I was handed an ipad with a form to complete. I was sure I had completed this form before last time – and checking with the receptionist she said it had to be completed every six months. So I had completed it before. It was a long form asking all sorts of details about medical history, medicines being taken etc. It included questions about lifestyle – how much exercise you get, whether you smoke, how much alcohol you drink etc. It all seemed rather over the top to be completing every six months. It seemed such an inefficient process and prone to error. Every patient completing all these detailed questions (often in a rush). And no way to check what my previous answers were – wouldn’t it be nice if they just pre-filled my previous answers and I could make any adjustments. All a little frustrating really. So I asked the receptionist why all this was needed.

“The government needs it,” was the reply. Really? What on earth do they do with it all, I wondered? I have to admit, that answer made me try a little experiment. I tried to see if the form would submit without me entering anything. It didn’t – it told me I had to sign the form first. So I signed it and sure enough it was accepted. So I handed the ipad back to the receptionist and she thanked me for being so quick. Off I went to my appointment and all was fine. And I felt as though I had struck a very small blow for freedom.

I wonder what does happen to all the data. Does it really go to “the government”? What would they do with it? Is it a case of gathering big data that can then be mined for trends – how the various factors affect dental health maybe? Well, one thing’s for sure, I wouldn’t trust the conclusions given how easy it seems to be to dupe the system. What guarantee is there on the accuracy of any of the data? Seems to me a case of garbage in, garbage out.

As we are all wowed by what Big Data can do and the incredible neural networks and algorithms teams can develop to help us (see previous blog), we do need to think about the source of the Big Data. Where has it come from? Could it be biased (almost certainly)? And in what way? How can we guard against the impact of that bias? There’s been a lot in the news recently about the dangers of bias – for example in Time and the Guardian. If we’re not careful, we can build bias into the algorithms and just continue with the discrimination we already have. Our best defence is scepticism. Just as when, in root cause analysis, an expert is quoted for evidence. As Edward Hodnett says: “Be sceptical of assertions of fact that start, ‘J. Irving Allerdyce, the tax expert, says…’ There are at least ten ways in which these facts may not be valid. (1) Allerdyce may not have made the statement at all. (2) He may have made an error. (3) He may be misquoted. (4) He may have been quoted only in part….”

Being sceptical and asking questions can help us avoid erroneous conclusions. Ask questions like: “how do you know that?”, “do we have evidence for that?” and “could there be bias here?”

Big Data has huge potential. But let’s not be wowed by it so that we don’t question. Be sceptical. Remember, it could be another case of garbage in, garbage out.

Image: Pixabay

To Err is Human But Human Error is Not a Root Cause

In a recent post I talked about Human Factors and different error types. You don’t necessarily need to classify human errors into these types but splitting them out this way helps us think about the different sorts of errors there are. This moves us on from when we get to ‘human error’ when carrying out our root cause analysis (using DIGR^® or another method). Part of the problem with having ‘human error’ as a root cause is that there isn’t much you can do with your conclusion. To err is human after all so let’s move on to something else. But people make errors for a reason and trying to understand why they made the error can lead us down a much more fruitful path to actions we can implement to try to prevent recurrence. If a pilot makes an error that leads to a near disaster or worse, we don’t just conclude that it was human error and there is nothing we can do about it. In a crash involving a self-driving car we want to go beyond “human error” as a root cause to understand why the human error might have occurred. As we get more self-driving cars on the road, we want to learn from every incident.

By getting beyond human error and considering different error types, we can start to think of what some actions are that we can implement to try to stop the errors occurring (“corrective actions”). Ideally, we want processes and systems to be easy and intuitive and the people to be well trained. When people are well trained but the process and/or system is complex, there are likely to be errors from time to time. As W. Edwards Deming once said, “A bad system will beat a good person every time.”

Below are examples of each of the error types described in my last post and example corrective actions.

Error Type	Example	Example Corrective Action
Action errors (slips)	Entering data into the wrong field in EDC	Error and sense checks to flag a possible error
Action errors (lapses)	Forgetting to check fridge temperature	Checklist that shows when fridge was last checked
Thinking errors (rule based)	Reading a date written in American format as European (3/8/16 being 8-Mar-2016 rather than 3-Aug-2016)	Use an unambiguous date format such as dd-mmm-yyyy
Thinking errors (knowledge based)	Incorrect use of a scale	Ensure proper training and testing on use of the scale. Only those trained can use it.
Non-compliance (routine, situational and exceptional)	Not noting down details of the drug used in the Accountability Log due to rushing	Regular checking by staff and consequences for not noting appropriately

These are examples and you should be able to think of additional possible corrective actions. But then which ones would you actually implement? You want the most effective and efficient ones of course. You want your actions to be focused on the root cause – or the chain of cause and effect that leads to the problem.

The most effective actions are those that eliminate the problem completely such as adding an automated calculation of BMI (Body Mass Index) from height and mass, for example, rather than expecting staff to calculate it correctly. If it can’t go wrong, it won’t go wrong (the corollary of Murphy’s Law). This is mistake-proofing.

The next most effective actions are ones that help people to get it right. Drop-down lists and clear, concise instructions are examples of this. Although instructions do have their limitations (as I will discuss in a future post). “No-one goes to work to do a bad job!” (W Edwards Deming again) so let’s help them do a good job.

The least effective actions are ones that rely on a check catching an error right at the end of the process. For example, the nurse checking the expiry date on a vial before administering. That’s not to say these checks should not be there, but rather they should be thought of as the “last line of defence”.

Ideally, you also want some sort of check to make sure the revised process is working. This check is an early signal as to whether your actions are effective at fixing the problem.

Got questions or comments? Interested in training options? Contact me.

DIGR^® is a registered trademark of Dorricott Metrics & Process Improvement Ltd.

“To err is human” – Alexander Pope

Root Cause Analysis – A Mechanic’s View

My car broke down recently and I was stuck by the side of the road waiting for a recovery company. It gave me an opportunity to watch a real expert in root cause analysis at work.

He started by ascertaining exactly what the problem was – the car had just been parked and would now not start. He then went into a series of questions. How much had the car been driven that day? Was there any history of the car not starting or being difficult to start? Next he was clearly thinking of the process of how a car starts up – the electrics of turning the motor, drawing fuel into the engine, spark plugs igniting the fuel, pistons moving and the engine idling. He started at the beginning of the process. Could the immobiliser be faulty? Had I dropped the key? No. Maybe the battery was not providing enough power. So he attached a booster – but to no avail. What about the fuel? Maybe it had run out? But the gauge showed ½ tank – had I filled it recently? After all the gauge might be faulty. Yes, I had filled it that day. Maybe the fuel wasn’t getting to the engine – so he tapped the fuel pipe to try to clear any blockage. No. Then he removed the fuel pipe and hey presto, no fuel was coming through. It was a faulty fuel pump. And must have just failed. This all took about 10 minutes.

The mechanic was demonstrating very effective root cause analysis. It’s what he does every day. Without thinking about how to do it. I asked him whether he had come across “Five Whys” – no he hadn’t. And as I thought about Five Whys with this problem, I wondered how he might have gone about it. Why has the car stopped? Because it will not start. Why will the car not start? Erm. Don’t know. Without gathering information about the problem he would not be able to get to root cause.

Contrast the Five Whys approach with the DIGR^® method:

Define – the car will not start

Is/Is not – the problem has just happened. No evidence of a problem earlier.

Go step-by-step – Starter motor, battery, immobiliser, fuel, spark plugs.

Root cause – He went through all the DIGR^® steps and it was when going through the process step-by-step that he discovered the cause. He had various ideas en route and tested them until he found the cause. He could have kept going of course – why did the fuel pump fail? But he had gone far enough, to a cause he had control over and could fix.

Of course, he hadn’t heard of DIGR^® and didn’t need it. But he was following the steps. In clinical trials, there is often not a physical process we can see and testing our ideas may not be quite so easy. But we can still follow the same basic steps to get to a root cause we can act on.

If you don’t carry out root cause analysis every day like this mechanic, perhaps DIGR^® can help remind you the key steps you should take. If you’re interested in finding out more, please feel free to contact me.

Photo: Craig Sunter (License)

DIGR^® is a registered trademark of Dorricott Metrics & Process Improvement Ltd.

Let’s Stop Confusing Everyone With CAPA!

I am really not a fan of the term “CAPA”. I think people’s eyes glaze over at the mention of it. It is seen as an administrative burden that the Quality Department and Regulators foist onto the people actually trying to do the real work. And I think it’s a mis-named term. CAPA stands for Corrective Action, Preventive Action. When there is a serious issue arising in a clinical trial, a CAPA is raised. This is meant to get beyond the immediate fire-fighting of the situation and to get to root cause so that corrective and/or preventive actions can be put in place. Sounds sensible. But what about when I ask you what the difference is between a corrective and a preventive action?

ISO9001:2008 defines them as:

Corrective Actions – “The organization shall take action to eliminate the causes of nonconformities in order to prevent recurrence.”

Preventive Actions – “The organization shall determine action to eliminate the causes of potential nonconformities in order to prevent their occurrence.”

Not very easy to get your head around in part because of the use of the word ‘prevent’ in both definitions. And if a Preventive Action is designed to prevent occurrence then that means the nonconformity (error) cannot have already occurred. And yet a CAPA is raised when a nonconformity (error) has occurred. So the PA part of CAPA seems wrong to me. The different definitions of Corrective and Preventive have caused no end of confusion as organisations implemented ISO9001. The good news is that in ISO9001:2015, there is a significant update in this area. When a significant issue (non-conformity) occurs you are expected to implement those immediate actions to contain the issue (termed Corrections) and also Corrective Actions to try to prevent recurrence. But the Preventive Actions are not associated with the issue. They now fit into an overall risk approach. By assessing risks in processes up-front and then continuously through their life-cycle, you are expected to develop ways to reduce the risk. These are the Preventive Actions or in risk language, the Mitigations.

Sound familiar? In clinical trials of course, we have the ICH addendum (ICH E6 R2) bringing in significant language on risk which brings it more in line with the revised ISO9001:2015 standard and is a welcome change. What is odd is that the addendum includes the following in 5.20.1:

If noncompliance that significantly affects or has the potential to significantly affect human subject protection or reliability of trial results is discovered, the sponsor should perform a root cause analysis and implement appropriate corrective and preventive actions.

This, unfortunately, mentions preventive actions next to corrective ones without any explanation of the difference and no link to the approach to risk in section 5.0. So it seems the confusion will remain in our area of work. And that confusion is compounded by our use of the CAPA terminology.

I would vote to get rid of the CAPA term all together and talk about CAR (Corrective Action Requests) and Risk. Maybe along with that, we could rehabilitate the whole approach. Done well with good root cause analysis and corrective actions, CARs are an important part of a learning organization. They should not be seen as some tedious administration that the Quality Department is requesting.

What do you think? Perhaps it’s all clear to you and you think CAPA is a great term?

In my next post I want to go back into the root cause analysis (RCA) process itself – whether DIGR^® or another method. I’ll talk more about the corrosive effect of blame on RCA and how to overcome it.

DIGR^® is a registered trademark of Dorricott MPI Ltd.

Picture: ccPixs.com

Go Step-By-Step to get to Root Cause

In an earlier post, I described my DIGR^® method of root cause analysis (RCA):

Define

Is – Is Not

Go Step By Step

Root Cause

In this post, I wanted to look more at Go Step By Step and why it is so powerful.

“If you can’t describe what you’re doing as a process, you don’t know what you’re doing” – a wonderful quote from W. Edwards Deming! And there is a lot of truth to it. In this blog, I’ve been using a hypothetical situation to help illustrate my ideas. Consider the situation where you are the Clinical Trial Lead on a vaccine study. Information is emerging that a number of the injections of trial vaccine have actually been administered after the expiry date of the vials. This has happened at several sites. You’ve taken actions to contain the situation for now. And have started using DIGR^® to try to get to the root cause. It’s already brought lots of new information out and you’ve got to Go Step By Step. As you start to talk through the process, it becomes clear that not everyone has the same view of what each role in the process should do. A swim-lane process map for how vaccine should be quarantined shows tasks split into roles and helps the team to focus on where the failures are occurring:

In going step-by-step through the process, it becomes clear that the Clinical Research Associates (CRAs) are not all receiving the emails. Nor are they clear what they should do with them when they do receive them. The CRA role here is really a QC role however – the primary process takes place in the other two swimlanes. And it was the primary process that broke down – the email going from the Drug Management System to the Site (step highlighted in red).

So we now have a focus for our efforts to try to stop recurrence. You can probably see ways to redesign the process. That might work for future clinical trials but could lead to undesired effects in the current one. So a series of checks might be needed. For example, sending test emails from the system to confirm receipt by site and CRA or regular checks for bounced emails. Ensuring CRAs know what they should do when they receive an email would also help – perhaps the text in the email can be clearer.

By going step-by-step through the process as part of DIGR^®, we bring the team back to what they have control of. We have moved away from blaming the pharmacists or the nurses at the two sites. Going down the blame route is never good in RCA as I will discuss in a future post. Reviewing the process as it should be also helps to combat cognitive bias which I’ve mentioned before.

As risk assessment, control and management is more clearly laid out in ICH GCP E6 (R2), process maps can help with risk identification and reduction too. To quote from section 5.0 “The sponsor should identify risks to critical trial processes and data”. Now we’ve discovered a process that is failing and could have significant effects on subject safety. By reviewing process maps of such critical processes, consideration can be given to the identification, prioritisation and control of risks. This might involve tools such as Failure Mode and Effects Analysis (FMEA) and redesign where possible in an effort to mistake-proof the process. This helps to show one way how RCA and risk connect – the RCA led us to understand a risk better and we can then put in controls to try to reduce the risk (by reducing the likelihood of occurrence). We can even consider how, in future trials, we might be able to modify the process to make similar errors much less likely and so reduce the risk from the start. This is true prevention of error.

In my next post I will talk about how (not) to ‘automate’ a process.

DIGR^® is a registered trademark of Dorricott MPI Ltd.

Overcoming the Hidden Assumptions of Root Cause Analysis

(Photo by Lars Ploughmann, Flikr; License)

In these strange days, when facts seem to matter less, I thought the pediment above the door of London’s Kirkaldy Testing and Experimenting Works from 1874 was rather good. Of course, with root cause analysis (RCA), we are trying to use all the facts available to get to root cause and not rely on lots of guesswork and opinion. In my last post I described a method of RCA that I called DIGR^® and I explained why I think it is more effective than the oft-taught “Five Whys” method. As a reminder the steps to DIGR^® are:

Define

Is – Is Not

Go Step By Step

Root Cause

When you decide you are going to carry out an RCA there are a number of hidden assumptions that you make. Being aware of these might mean you don’t fall into a trap. In the comments to my previous posts, people have mentioned some of these already and I wanted to explore five of them a little further.

1.Assuming that the effects you see are all due to the same root cause. In the example I have been using in this blog where expired vaccine was administered to several patients at two different sites, we carried out an RCA using DIGR*. In doing so, we assumed that the root cause of the different incidents was the same and the evidence we gathered in the DIGR^® process seems to confirm that. But it is possible that these independent incidents have no common root cause – the issue occurred for different reasons at each site. As you review the evidence in the Is-Is Not and Root Cause parts of DIGR^® it is worth remembering that the effects might be from different root causes. This is likely to show up when the analysis seems to be getting stuck and facts seem to be at odds with each other.

2. Assuming there is only one root cause. Often issues happen because of more than one root cause or ‘causal factor’. Sometimes there is benefit in focusing on just one of these but other times, there may be a benefit in considering more than one. In our example, we came to the conclusion that the root cause was that ‘the process of identifying expired batches and quarantining them has not been verified’. This is something we can tackle with actions and try to stop a recurrence of the issue. But we could have gone down the path of trying to understand why the checks in the process had failed on these occasions and tried to get to root cause on those. We would have started looking at Human Factors which I will cover in a subsequent post. You have to make a judgement on how many strands of the issue you want to focus your efforts on. In our example we have assumed that by focusing on the primary process, the pharmacists and nurses will not have expired vaccine and so their check (whilst still a good one) should never show up expired vaccine.

3. Assuming you have enough information to work out the cause and effect relationships. Frustrating though it is, it is not always possible to get to root cause with the facts you have available. You always want to use facts (evidence) to check whether your root cause is sound and if you’re really in the guessing mode. If there is no further information available you might have to put additional QC checks in place until you obtain more facts. In our example, if we carried out a RCA using DIGR^® straight after the first issue occurred, we might have focused on the root cause being at that particular site on the basis it had not happened at any others (the Is-Is Not part of DIGR^®). But we might simply not know enough about exactly what happened at that one site. Of course, following further cases at another site, we realised that there was a more fundamental, systemic issue.

4. Assuming all facts presented are true. I’ve mentioned Edward Hodnett’s book from 1955 “The Art of Problem Solving” previously. There is a chapter on ‘facts’ and in it he says: “Be sceptical of assertions of fact that start, ‘J. Irving Allerdyce, the tax expert, says…’ There are at least ten ways in which these facts may not be valid. (1) Allerdyce may not have made the statement at all. (2) He may have made an error. (3) He may be misquoted. (4) He may have been quoted only in part.” Hodnett goes on to list another six possible reasons the facts might not be valid. This is not to say you should disbelieve people – but rather that you should be sceptical. Asking follow up questions such as “how do you know that?” and “do we have evidence for that?” help avoid erroneous facts setting you off in the wrong direction on your search for root cause.

5. Assuming that because an issue appears to be the same as another issue, the root cause is the same. One of the challenges with carrying out a good RCA is the lack of time. When we are pressurized to get results now, we focus on containing the issue and getting to root cause comes lower down in the priorities. After all, if we get to root cause and put fixes in place, we will help the organization in the future but it doesn’t help us now. As RCA is often a low priority, it is also rushed. And to quote Tim Lister from Tom DeMarco’s book Slack, “people under time pressure don’t think faster.” One way of short-cutting thinking is to use a cognitive short-cut and just assume that the root cause must be the same as a similar issue you saw years ago. If you go down that route you really need to test the root cause against the available facts to make sure it stands up in this case too. Deliberate use of the DIGR^® method of RCA can help combat this cognitive bias as it takes you logically through the steps of Define, Is-Is Not, Go step by step and Root Cause. People need time to think.

DIGR^® can help with the focus on facts rather than opinion in RCA. It helps pull together all the available facts rather than leaving some to the side by focusing on ‘why’ too early.

In my next post I will go into some more detail on the G of DIGR^®. How using process maps can really help everyone involved to Go step by step and start to see where a process might fail.

DIGR^® is a registered trademark of Dorricott MPI Ltd.

Use DIGR to get to the Root Cause!

(Photo: Martin Pettitt, License)

I want to thank everyone who read, commented or liked my last post – “Root Cause Analysis: we have to do better than Five Whys”. Many seemed to agree that the Five Whys approach is really not up to the job. The defense of Five Whys seemed to fall into a number of buckets – “It’s just a tool”, “It’s a philosophy, not a tool”, “It needs someone who is trained to use it”, “It’s not meant to be literal: it’s not only about whys”, “It’s not meant to be literal: five isn’t a magic number”. No-one tried defending the Lincoln Memorial example which is so often used to teach Five Whys. I really do think it is a poor tool on its own – at the very least, it is mis-named. I think we do people a mis-service by suggesting “just ask why five times” – we over-simplify and mislead. I think there is a better way. One that is still simple but, importantly, doesn’t miss out key information to help get to root cause and is more likely to lead to consistent results. This is why I came up with the DIGR^® method. At the end of this post I explain the basis for DIGR^®. There are many sophisticated RCA methods and they have their place but I do think we’d do well to replace Five Whys with DIGR^®:

Define the problem. You need to make sure everyone is focused on the same issue for the RCA. This sounds trivial but is an important step. What is the problem you are focusing on? You would be surprised how often this simple question brings up a discussion.
Is – Is Not. Consider Is – Is Not from the perspective of Where, When and How Many. Where is the issue and where is it not? How many are affected and how many not? When did the problem start or has it always been there?
Go step-by-step. Go step-by-step through the process. What should happen – is it defined? Was the process followed? Were Quality Control (QC) steps implemented and does data from them tell you anything? If an escalation occurred earlier was the issue dealt with appropriately? This is where a process map would help.
Root cause. Use the information gathered to generate possible root causes. Then use why questions until you get to the right level of cause – you need to get back far enough in the cause-effect process that you can implement actions to address the cause but not to go back too far. This is where experience becomes invaluable. Narrow down to one or two root causes – ideally with evidence to back them up.

Of course, once you have your root cause you will want to develop actions to address the root cause and to monitor the situation. I will talk more about these in future posts. For now, I want to use an example with the DIGR^® method of RCA.

Consider a hypothetical situation where you are the Clinical Trial Lead on a vaccine study. Information is emerging that a number of the injections of trial vaccine have actually been administered after the expiry date of the vials. This has happened at several sites. The first thing you should do is contain the problem. You do not need DIGR^® for this. When you have chance to carry out the RCA, what might the DIGR^® approach look like?

Define. Let’s make sure everyone agrees on what the problem is. It’s not that a nurse didn’t notice that a vial that was about to be administered was past its expiry date. Rather it is that expired vaccine has been administered to multiple patients at multiple sites.

Is – Is Not (Where and When). Where is the issue? It has happened in two sites in two regions (North America and Western Europe). In one site, it has happened twice and this is where the problem was discovered by the CRA reviewing documentation. Is there anything different about the sites where it happened versus those where it did not? There is only one batch that has actually passed the expiry date and not all sites received that batch. So there are many sites where this problem could not have occurred (yet). In fact in reviewing the data we see that for the sites with the expired batch, there have only been 30 administrations of the vaccine since the expiry date. So there was the potential for 30 cases and we have three at two sites. 27 other administrations were of unexpired vaccine.

Go step-by-step. What should actually happen? Each batch has the same expiry date. The drug management system determines which vials are sent to which site based on the recruitment rate. The system flags when there are vials that are expiring soon at particular sites and sends an email. The email explains the action needed – to quarantine expired vials by placing them away from the non-expired ones and being clearly labelled. These are then collected to be destroyed centrally. So this process must have failed somewhere. Further investigation highlights that the the two sites did not receive the email. In fact, email addresses used to send the notification to the sites have minor errors in them – indeed not just the two sites where the issue occurred but in another three. At the two sites with the issue, the emails did not arrive and so they were not informed of expired vaccine and did not specifically go in to quarantine them. There are also no checks in place to make sure the process works – test emails, check for bounced emails, copy to CRA to follow up with site etc.

Root cause. Based on all the information brought together in this RCA, it seems that this was an issue waiting to happen. One route of enquiry is why the two sites did not check the expiry date prior to administration. This could go down the route of blame which is unlikely to lead to root cause (as I will discuss in a future post). But a more fundamental question is how the nurses at these sites were given expired vaccine in the first place. We were lucky in 27 cases – presumably good practice at sites stopped the issue from occurring. But we don’t want to rely on luck. Why did the nurses and pharmacists have expired drug available to use? Because the process of identifying expired batches and quarantining them has not been verified. I would argue this is the root cause. You could go further to trying to understand how the erroneous email addresses were entered into the drug management system but the level we have got to means we can take action – it is within our control to stop this recurring. In other words, we are at the right level to develop countermeasures.

In my next post I will expose some of the hidden assumptions of RCA.

I hope you are intrigued by the DIGR^® method of root cause analysis. Could we replace Five Whys with DIGR^®? Of course, I welcome your thoughts, comments and challenges to the approach!

Some background to DIGR^®

Some people seem naturally good at seeking out root cause. And when you try to formulate the method it is not easy. In DIGR^® I have brought together various approaches. Define comes from the D in DMAIC as part of Six Sigma. It is also part of A3 methodology. Is – Is Not comes from the approach described by Kepner and Tregoe in “The New Rational Manager”. Go Step-by-Step comes from Lean Sigma’s process and systems approach – to quote W. Edwards Deming, “If you can’t describe what you’re doing as a process, you don’t know what you’re doing”. Root Cause is, in part, the Five Whys approach – but only used after gathering critical information from the other parts of DIGR^® and without a need for five. To look at DIGR^® from the approach of 5WH: D=Who and What, I=When and Where, G=How, R=Why.

DIGR^® is a registered trademark of Dorricott MPI Ltd.

Root Cause Analysis – We have to do better than Five Whys!

(Photo: Ad Meskens)

If you’ve ever had training on root cause analysis (RCA) you will almost certainly have learnt about Five Whys. Keep asking ‘why’ five times until you get to the root cause. The most famous example is of the Lincoln Memorial in Washington. The summary of this Five Whys example is reproduced below from an article by Joel A Gross:

Problem: The Lincoln Memorial in Washington D.C. is deteriorating.

Why #1 – Why is the monument deteriorating? Because harsh chemicals are frequently used to clean the monument.

Why #2 – Why are harsh chemicals needed? To clean off the large number of bird droppings on the monument.

Why #3 – Why are there a large number of bird droppings on the monument? Because the large population of spiders in and around the monument are a food source to the local birds

Why #4 – Why is there a large population of spiders in and around the monument? Because vast swarms of insects, on which the spiders feed, are drawn to the monument at dusk.

Why #5 – Why are swarms of insects drawn to the monument at dusk? Because the lighting of the monument in the evening attracts the local insects.

Solution: Change how the monument is illuminated in the evening to prevent attraction of swarming insects

This example is easy to understand and seems to demonstrate the benefit of the approach of Five Whys. Five Whys is simple but suffers from at least two significant flaws – i) it is not repeatable and ii) it does not use all available information.

Different people will answer the why questions differently and their responses will take them to a different conclusion. For example to Why #2 “Why are harsh chemicals needed?”, the response might be “Because the bird droppings are difficult to remove with just soap and water”. This leads to Why #3 of “Why are bird droppings difficult to remove with just soap and water?” and you can see that the conclusion (“root cause”) will end up being very different. The approach is very dependent on the individuals involved and is not repeatable.

Other questions that would be really beneficial to ask but would not be asked using a Five Why approach are:

- When did the problem start? Armed with the answer to this might have helped link the timing with when the lighting timing was changed.
- How many other monuments have this problem? If other monuments do not have this problem then what is different? If other monuments have this problem then what is the same? This line of questioning is, again, more likely to get to the lighting timing quickly and reliably because a monument without lighting and without the problem suggests the lighting might have something to do with the cause.

In my last post I described a hypothetical situation of a vaccine trial where subjects had received expired vaccine. If we use the Five Whys approach, it might go something like:

Why did subjects receive expired vaccine? Because an expired batch was administered at several sites; Why was an expired batch administered at several sites? Because the pharmacists didn’t check the expiry date; Why didn’t the pharmacists check the expiry date? Here we get stuck because we don’t know. So maybe we could try again.

Why did subjects receive expired vaccine? Because an expired batch was administered at several sites; Why was an expired batch administered at several sites? Because the expired batch wasn’t quarantined. Why wasn’t the expired batch quarantined? Because sites didn’t carry out their regular check for expired vaccine. Why didn’t sites carry out their regular check for expired vaccine? Because they forget maybe? Or perhaps didn’t have a system in place? As I hope you can see, we really end up in guess work using Five Whys because we are not using all the available information. Information such as which sites had the problem and which didn’t? When did the problems occur? What is the process that ensures expired vaccine is not administered? How did that process fail?

Five Whys can be fitted to the problem once the cause is known but it is not a reliable method on its own to get to root cause. Why is definitely an important question in RCA. But it’s not the only question. To quote the author of ‘The Art of Problem Solving’, Edward Hodnett, “If you don’t ask the right questions, you don’t get the right answers. A question asked in the right way often points to its own answer. Asking questions is the ABC of diagnosis. Only the inquiring mind solves problems.”

Here are more of my blog posts on root cause analysis where I describe a better approach than Five Whys. Got questions or comments? Interested in training options? Contact me.

Note: it is worth reading Gross’s article as it reveals the truth behind this well-known scenario of Lincoln’s Memorial.

DIGR^® is a registered trademark of Dorricott MPI Ltd.

Root cause analysis can help you sleep at night

Who cares about root cause analysis (RCA)? Of course, we all do now it’s in the revised GCP, the ICH E6 (R2) Addendum*. But does it really matter? It’s easiest to think through from the perspective of an example. Consider a hypothetical situation where you are the Clinical Trial Lead on a vaccine study. Information is emerging that a number of the injections of trial vaccine have actually been administered after the expiry date of the vials. This has happened at several sites. Some example actions you might take immediately are – review medical condition of the subjects affected, review stability data to try to estimate the risk, ask CRAs to check expiry dates on all vaccine at sites on their next visit, remind all sites of the need to check the expiry date prior to administering the vaccine. These and many other actions are ones your team could quickly generate and implement and they will likely contain the issue for now. It took no root cause analysis to generate these actions. Could you sleep being confident in the knowledge that the problem won’t recur?

Without RCA, you don’t really know why the problem occurred and so you can’t fix it at the source. All you can do is put in additional checks and as these are implemented reactively, they may not be properly thought through and people may be poorly trained (or not trained at all) on the additional checks. We also know that while checks are valuable in a process they are not 100% effective when carried out by people. In this example we can be sure that the pharmacist dispensing and the nurse administering the vaccine have been trained to check the expiry date and yet we still have cases where expired vaccine has been administered. Do we really think that reminding the pharmacist and nurse is going to be enough to fix the problem forever? In a future blog, I will describe a powerful technique for RCA but for now, imagine you had managed to carry out a root cause analysis on this situation.

What you might discover in carrying out a RCA is that there is no defined process for informing sites of expired vaccine and requiring them to quarantine it. Or perhaps that the expiry date is written in American date format but being read in European format (3/8/16 being 8-Mar-2016 or 3-Aug-2016). Whatever the actual root cause, by finding what it is (or they are) you can start to consider options to try to stop recurrence. And with additional checks you could look for early signals in case these actions are not effective. By taking these actions, would you be more likely to sleep at night?

Think you know about RCA? In my next blog I will reveal why the Five Whys method we’re always told to use is not good enough for a complex situation such as this. And later, I will provide a description of a powerful technique for RCA that seems to be seldom used. If you want to hear more, please subscribe to my blog on the left of the screen. All comments welcomed!

And did you notice that I haven’t mentioned CAPA once (until now!)

* Section 5.20.1 Addendum: “If noncompliance that significantly affects or has the potential to significantly affect human subject protection or reliability of trial results is discovered, the sponsor should perform a root cause analysis and implement appropriate corrective and preventive actions.” [my emphasis]