Errors

What No-one Tells You About Root Cause Analysis

When something significant goes wrong, we all know that getting to the root cause is an important start to understanding and helping to prevent the same issue recurring. I’ve talked many times in this blog about methods of root cause analysis and, of course, I recommend DIGR-ACT®. But there are other methods too. The assumption with all these methods is that you can actually get to the root cause(s).

I was running process and risk training for the Institute of Clinical Research recently. The training includes root cause analysis. And one of the trainees gave an example of a Principal Investigator (PI) who had randomized a patient, received the randomization number and proceeded to pick the wrong medication lot for the patient. She should have selected the medication lot that matched the randomization number but picked the wrong one. This was discovered later in the trial when Investigational Product accountability was carried out by the CRA visiting the site. By this time, of course, the patient had potentially been put at risk and the results could not be included in the analysis. So why had this happened? It definitely seemed to be human error. But why had that error occurred?

The PI was experienced in clinical trials. She knew what to do. This error had not occurred before. There was no indication that she was particularly rushed or under pressure on that day. The number was clear and in large type. How was it possible to mis-read the number? The PI simply said she made a mistake. And mistakes happen. That’s true, of course, but would we accept that of an aeroplane pilot? We’d still want to understand how it happened. Human error is not a root cause. But if human error isn’t the root cause, what is?

Sometimes, we just don’t know. Root cause analysis relies on cause and effect. If we don’t understand the cause and effect relationships, we will not be able to get to true root causes. But that doesn’t mean we just hold up our hands and hope it doesn’t happen again. That would never pass in the airline industry. So what should we do in this case?

It’s worth trying to see, first, how widespread a problem this is. Has it happened before at other sites? On other studies? What are the differences between sites / studies where this has and had not happened? This may still not be enough to lead you to root cause(s). If not, then maybe we could modify the process to make it less likely to recur? Could we add a QC step such as having the PI write the number of the medication down next to the randomization number – this should highlight a difference if there is one. Or perhaps they could enter the number into a system so that it can check, Or maybe there has to be someone else at the site that checks at the point of dispensing.

A secret in root cause analysis that is rarely mentioned is that sometimes you can’t get to the root cause(s). There are occasions when you simply don’t have enough information to be able to get there. In these cases, whatever method you use, you cannot establish the root cause(s). Of course, if you do, it will help in determining effective actions to help stop recurrence. But without establishing root cause(s), there are still actions you can take to try to reduce the likelihood of recurrence.

What’s Swiss Cheese got to do with Root Cause Analysis?

“There can be only one true root cause!” Let’s examine this oft-made statement with an example of a root cause analysis. Many patients in a study have been found at database lock to have been mis-stratified – causing difficulties with analysis and potentially invalidating the whole study. We discover that at randomization, the health professional is asked “Is the BMI ≤ 25? Yes/No”. In talking with CRAs and sites we realise that at a busy site, where English is not your first language, this is rather easy to answer incorrectly. If we wanted to make it easier for the health professional to get it right, why not simply ask for the patient’s height and weight. Once those are entered, the IXRS could calculate the BMI and determine whether it is less than or equal to 25. This would be much less likely to lead to error. So, we determine that the root cause is that “the IXRS was set up without considering how to reduce the likelihood for user error.” We missed an opportunity to prevent the error occurring. That’s definitely actionable. Unfortunately, of course, it’s too late for this study but we can learn from the error for existing and future studies. We can look at other studies to see how they stratify patients and whether a similar error is likely to occur. We can update the standards for IXRS for future studies. Great!

But is there more to it? Were there other actions that might have helped prevent the issue? Why was this not detected earlier? Were there opportunities to save this study? As we investigate further, we find:

During user acceptance testing, this same error occurred but was put down to user error.
There were several occasions during the study where a CRA had noticed that the IXRS question was answered incorrectly. They modified the setting in EDC but were unable to change the stratification as this is set at randomization. No-one had realized that this was a systemic issue (i.e. had been detected at several sites due to a special cause).

Our one root cause definitely takes us forward. But there is more to learn from this issue. Perhaps there are some other root causes too. Such as “the results of user acceptance testing were not evaluated for the potential of user error”. And “issues detected by CRAs were not recognised as systematic because there is no standard way of pulling out common issues found at sites.” These could both lead to additional actions that might help to reduce the likelihood of the issue recurring. And notice that actions on these root causes might also help reduce the likelihood of other issues occurring too.

In my experience, root cause analysis rarely leads to one root cause. In a recent training course I was running for the Institute of Clinical Research, one of the delegates reminded me of the “Swiss Cheese” model of root causes. There are typically many hazards, such as a user entering data into an IXRS. These hazards don’t normally end up as issues because we put preventive measures in place (such as standards, user acceptance testing, training). You can think of each of these preventive measures as a slice of swiss cheese – they prevent many hazards becoming issues but won’t prevent everything. Sometimes, a hazard can get through a hole in the cheese. We also put detection methods in place (such as source data verification, edit checks, listing review). You can think of each of these as additional slices of cheese which prevent issues growing more significant but won’t prevent everything. It’s when the holes in each of the layers of prevention and detection line up that a hazard can become a significant issue that might even lead to the failure of a study. So, in our example, the IXRS was set up poorly (a prevention layer failed), the user acceptance testing wasn’t reviewed considering user error (another prevention layer failed), and CRA issues were not reviewed systematically (a detection layer failed). All these failures led to the study potentially being lost.

So if, in your root cause analysis, you have only one root cause, maybe it’s time to take another look. Are there maybe other learnings you can gain from the issue? Are there other prevention or detection layers that failed?

Do you need help in root cause analysis? Take a look at DIGR-ACT training. Or give me a call.

Oh No – Not Another Audit!

It has always intrigued me, this fear of the auditor. Note that I am separating out auditor from (regulatory) inspector here. Our industry has had an over reliance on auditing for quality rather than on building our processes to ensure quality right the first time. The Quality Management section of ICH E6 (R2) is a much needed change in approach. And this has been enhanced by the ICH E8 (R1) (draft) “Quality should rely on good design and its execution rather than overreliance on retrospective document checking, monitoring, auditing or inspection”. The fear of the auditor has led to some very odd approaches.

Trial Master File (TMF) is a case in point. I seem to have become involved with TMF issues and improving TMF processes a number of times in CROs and more recently have helped facilitate the Metrics Champion Consortium TMF Metrics Work Group. The idea of an inspection ready TMF at all times comes around fairly often. But to me, that misses the point. An inspection ready (or audit ready) TMF is a by-product of the TMF processes working well – not an aim in itself. We should be asking – what is the TMF for? The TMF is to help in the running of the trial (as well as to document it to be able to demonstrate processes, GCP etc were followed). It should not be an archive gathering dust until an audit or inspection is announced when a mad panic ensues to make sure the TMF is inspection ready. It should be being used all the time – a fundamental source of information for the study team. Used this way, gaps, misfiles etc will be noticed and corrected on an ongoing basis. If the TMF is being used correctly, there shouldn’t be significant audit findings. Of course, process and monitoring (via metrics) need to be set up around this to make sure it works. This is process thinking.

And then there are those processes that I expect we have all come across. No-one quite understands why there are so many convoluted steps. Then you discover that at some point in the past there was an audit and to close the audit finding (or CAPA), additional steps were added. No-one knows the point of the additional steps any more but they are sure they must be needed. One example I have seen was of a large quantity of documents being photo-copied prior to sending to another department. This was done because documents had got lost on one occasion and an audit had discovered this. So now someone spent 20% of their day photocopying documents in case they got lost in transit. Not a good use of time and not good for the environment. Better to redesign the process and then consider the risk. How often do documents get lost en route? Why? What is the consequence? Are some more critical than others? Etc. Adding the additional step to the process due to an audit finding was the easiest thing to do (like adding a QC step). But it was the least efficient response.

I wonder if part of the issue is that some auditors appear to push their own solution too hard. The process owner is the person that understand the process best. It is their responsibility to demonstrate they understand the audit findings, to challenge where necessary, and to argue for the actions they think will address the real issues. They should focus on the ‘why’ of the process.

Audit findings can be used to guide you in improving the process to take out risk and make it more efficient. Root cause analysis, of course, can help you with the why for particular parts of the process. And again, understanding the why helps you to determine much better actions to help prevent recurrence of issues.

Audits take time, and we would rather be focusing on the real work. But they also provide a valuable perspective from outside our organisation. We should welcome audits and use the input provided by people who are neutral to our processes to help us think, understand the why and make improvements in quality and efficiency. Let’s welcome the auditor!

Image: Pixabay

Why Can’t You Answer My Simple Question?

Often during a Root Cause Analysis session, it’s easy to get lost in the detail. The issues are typically complex and there are many aspects that need to be considered. Something that really doesn’t help is when people seem to be unable to answer a simple question. For example, you might ask “At what point would you consider escalating such an issue?” and you get a response such as “I emphasised the importance of the missing data in the report and follow-up letter.” The person seems to be making a statement about something different and has side-stepped your question. Why might that be?

Of course, it might be simply that they didn’t understand the question. Maybe English isn’t their first language, or the phone line is poor. Or they were distracted by an urgent email coming in. If you think this is the reason, it’s worth asking again – perhaps re-wording and making sure you’re clear.

Or maybe, they don’t know the answer but feel they need to answer anyway. A common questioning technique is to ask an open question and then be silent to try to draw out a response. People tend not to like silence and so they fill the gap. An unintended consequence of this might be that they fill the gap with something that doesn’t relate to the question you asked. They may feel embarrassed that they don’t know the answer and feel they should try to answer with something. You will need to listen carefully to the response and perhaps if it appears they simply don’t know the answer, you could ask them whether anyone else might. Perhaps the person who knows is not at the meeting.

Another possibility is that they are fearful. They might fear the reaction of others. Perhaps procedures weren’t followed and they know they should have been. But admitting it might bring them, or their colleagues, trouble. This is probably more difficult to ascertain. To understand whether this is going on, you’ll need to build a rapport with those involved in the root cause analysis. Can you help them by asking them to think of Gilbert’s Behavioral Engineering factors that support good performance? Was the right information available at the right time to carry out the task? What about appropriate, well-functioning tools and resource? And were those involved properly trained? See if you can get them thinking about how to stop the issue recurring – as they come up with ideas, that might lead to a root cause of the actual issue. For example, if they think the escalation plan could be clearer, is a root cause that the escalation plan was unclear?

“No-one goes to work to do a bad job!” [W. Edwards Deming] They want to help improve things for next time. If they don’t seem to be answering your question – what do you think the root cause of that might be? And how can you overcome it?

Do you need help in root cause analysis? Take a look at DIGR-ACT training. Or give me a call.

No Blame – Why is it so Difficult?

I have written before about the importance of removing blame when trying to get to the root causes of an issue. To quote W Edwards Deming, “No one can put in his [/her] best performance unless he [/she] feels secure. … Secure means without fear, not afraid to express ideas, not afraid to ask questions. Fear takes on many faces.” But why is it so difficult to achieve? You can start a root cause analysis session by telling people that it’s not about blame but there’s more to it than telling people.

It’s in the culture of an organization – which is not easy to change. But you can encourage “no blame” by your questioning technique and approach too. If significant issues at an investigative site have been uncovered during an audit, the easiest thing might be to “blame” the CRA. Why didn’t he/she find the problems and deal with them earlier? What were they doing? Why didn’t they do it right? If I was the CRA and this appeared to be the approach to get to root cause, I certainly would be defensive. Yes, I got it wrong and I need to do better next time. Please don’t sack me! I would be fearful. Would it really help to get to the root causes?

Would it be better to start by saying that QC is not 100% effective – we all miss things. What actually happens before, during and after a monitoring visit to this site? Are the staff cooperative? Do they follow-up quickly with questions and concerns? And the key question – “What could be done differently to help make it more likely that these issues would have been detected and dealt with sooner?” This is really getting at the Gilbert’s Behavior Engineering Model categories. Are site staff and CRA given regular feedback? Are the tools and resources there to perform well? Do people have the right knowledge and skills?

This is where you’re likely to start making progress. Perhaps the site has not run a clinical trial before, they are research-naïve. We haven’t recognised this as a high risk site and are using our standard monitoring approach. The CRA has limited experience. There’s been no co-monitoring visit and no-one’s been reviewing the Monitoring Visit Reports – because there’s a lack of resources due to high CRA turnover and higher than expected patient enrollment. And so on and so on…To quote W. Edwards Deming again, “Nobody goes to work to do a bad job.”

Don’t just tell people it’s not about blame. Show that you mean it by the questions you ask.

Want to find more about effective root cause analysis in clinical trials? Visit www.digract.com today.

Please FDA – Retraining is NOT the Answer!

The FDA has recently issued a draft Q&A Guidance Document on “A Risk-Based Approach to Monitoring of Clinical Investigations”. Definitely worth taking a look. There are 8 questions and answers. Two that caught my eye:

Q2. “Should sponsors monitor only risks that are important and likely to occur?”

The answer mentions that sponsors should also “consider monitoring risks that are less likely to occur but could have a significant impact on the investigation quality.” These are the High Impact, Low Probability events that I talked about in this post. The simple model of calculating risk by multiplying Impact and Probability essentially prioritises a High Impact, Low Probability event the same as a Low Impact, High Probability event. But many experts in risk management think these should not be prioritized equally. High Impact, Low Probability events should be prioritised higher. So I think this is a really interesting answer.

Q7. “How should sponsors follow up on significant issues identified through monitoring, including communication of such issues?”

One part of the answer here has left me aghast. “…some examples of corrective and preventive actions that may be needed include retraining…” I have helped investigate issues in clinical trials so many times, and run root cause analysis training again and again. I always tell people that retraining is not a corrective action. Corrective actions should be based on the root cause(s). See a previous post on this and the confusing terminology. If you think someone needs retraining, ask yourself “why?” Could it be:

- - They were trained but didn’t follow the training. Why? Could it be one or more of the Behavior Engineering Model categories was not supported e.g. they didn’t have time, they didn’t have the right tools, they weren’t provided with regular feedback to tell them how they were doing? Etc. If it’s one of these, then focus on that. Retraining will not be effective.
  - They haven’t ever received training. Why? Maybe they were absent when the rest of the staff was trained and there was no plan to make sure they caught up later. They don’t need retraining – they were never trained. They need training. And is it possible that there might be others in this situation? Who else might have missed training and needs training now? Maybe at other sites too.
  - There was something missing from the training (as looks increasingly likely as one possible root cause in the tragic case of the Boeing 737 Max). Then the training needs to be modified. And it’s not about retraining one person or one site on training they had already received. It’s about training everyone on the revised training. Of course, later on, you might want to try to understand why an important component was missing from the training in the first place.

I firmly believe retraining is never the answer. There must be something deeper going on. If your only action is retraining, then you’ve not got to the root cause. I can accept reminding as an immediate action – but it’s not based on a root cause. It is more about providing feedback and is only going to have a short-term effect. An elephant may never forget but people do.

Got questions or comments? Interested in training options? Contact me.

Beyond Human Error

One of my most frequently viewed posts is on human errors. I am intrigued by this. I’ve run training on root cause analysis a number of times and occasionally someone will question my claim that human error is not a root cause. Of course, it may be on the chain of cause-and-effect but why did the error occur? And you can be sure it’s not the first time the error has occurred – so why has it occurred on other occasions? What could be done to make the error less likely to occur? Using this line of questioning is how we can make process improvements and learn from things that go wrong rather than just blame someone for making a mistake and “re-training” them.

There is another approach to errors which I rather like. I was introduced to it by SAM Sather of Clinical Pathways. It comes from Gilbert’s Behavior Engineering Model and provides six categories that need to be in place to support the performance of an individual in a system:

Category	Example questions
Expectations & Feedback	Is there a standard for the work? Is there regular feedback?
Tools, Resources	Is there enough time to perform well? Are the right tools in place?
Incentives & Disincentives	Are incentives contingent on good performance?
Knowledge & Skills	Is there a lack of knowledge or skill for the tasks?
Capacity & Readiness	Are people the right match for the tasks?
Motives & Preferences	Is there recognition of work well done?

Let’s take an example I’ve used a number of times: getting documents into the TMF. As you consider Gilbert’s Behavior Engineering Model you might ask:

- Do those submitting documents know what the quality standard is?
- Do they have time to perform the task well? Does the system help them to get it right first time?
- Are there any incentives for performing well?
- Do they know how to submit documents accurately?
- Are they detail-oriented and likely to get it right?
- Does the team celebrate success?

I have seen systems with TMF where most of the answers to those questions is “no”. Is there any wonder that there are rejection rates of 15%, cycle times of many weeks and TMFs that are never truly “inspection ready”?

After all, “if you always do what you’ve always done, you will always get what you’ve always got”. Time to change approach? Let’s get beyond human error.

Got questions or comments? Interested in training options? Contact me.

DIGR-ACT^® is a registered trademark of Dorricott Metrics & Process Improvement Ltd.

Picture: Based on Gilbert’s Behavior Engineering Model

What My Model of eTMF Processing Taught Me (Part II)

In a previous post, I described a model I built for 100% QC of documents as part of an eTMF process. We took a look at the impact of the rejection rate for documents jumping from 10% to 15%. It was not good! So, what happens when an audit is announced and suddenly the number of documents submitted doubles? In the graph below, weeks 5 and 6 had double the number of documents. Look what it does to the inventory and cycle time:

The cycle time has shot up to around 21 days after 20 weeks. The additional documents have simply added to the backlog and that increases the cycle time because we are using First In, First Out.

So what do we learn overall from the model? In a system like this, with 100% QC, it is very easy to turn a potential bottleneck into an actual bottleneck. And when that happens, the inventory and cycle time will quickly shoot upwards unless additional resource is added (e.g. overtime). But, you might ask, do we really care about cycle time? We definitely should: if the study team can’t access documents until they have gone through the QC, those documents are now not available for 21 days on average. That’s not going to encourage every day use of the TMF to review documents (as the regulators expect). And might members of the study team send in duplicates because they can’t see the documents that are awaiting processing? Adding further documents and impacting inventory and cycle time still further. And this is not a worst case scenario as I’m only modelling one TMF here – typically a Central Files group will be managing many TMFs and may be prioritizing one over another (i.e. not First In, First Out). This spreads out the distribution of cycle times and will lead to many more documents that are severely delayed through processing.

“But we need 100% QC of documents because the TMF is important!” I hear you shout. But do you really? As the great W Edwards Deming said, “Inspection is too late. The quality, good or bad, is already in the product.” Let’s get quality built in in the first place. You should start by looking at that 15% rejection rate. What on earth is going on to get a rejection rate like that? What are those rejections? Are those carrying out the QC doing so consistently? Do those uploading documents know the criteria? Is there anyone uploading documents who gets it right every time? If so, what is it that they do differently to others?

What if you could get the rejection rate down to less than 1%? At what point would you be comfortable taking a risk-based approach – that assumes those uploading documents do it right the first time? And carrying out a random QC to look for systemic issues that could then be tackled? How much more efficient this would be. See the diagram in this post. And you’d remove that self-imposed bottleneck. You’d get documents in much quicker, costing less and with improved quality. ICH E6 (R2) is asking us to consider quality as not being 100% but concerning ourselves with errors that matter. Are we brave enough as an industry to apply this to the TMF?

Picture: CC BY 2.0 Remko Van Dokkum

What My Model of eTMF Processing Taught Me

On a recent long-haul flight, I got to thinking about the processing of TMF documents. Many organisations and eTMF systems seem to approach TMF documents with the idea that every one must be checked by someone other than the document owner. Sometimes, the document owner doesn’t even upload their own documents but provides them, along with metadata, to someone else to upload and index. And then their work is checked. There are an awful lot of documents in the TMF and going through multiple steps of QC (or inspection as W Edwards Deming would call it) seems rather inefficient – see my previous posts. But we are a risk-averse industry – even having been given the guidance to used risk-based approaches in ICH E6 (R2) and so many organizations seem to use this approach.

So what is the implication of 100% QC? I decided I would model it via an Excel spreadsheet. My assumptions are that there are 1000 documents submitted per week. Each document requires one round of QC. The staff in Central Files can process up to 1100 documents per week. I’ve included a random +/-5% to these numbers for each week (real variation is much greater than this I realise). I assume 10% of documents are rejected at QC. And that when rejected, the updated documents are processed the next week. I’ve assumed First In, First Out for processing. My model looks at the inventory at the end of each week and the average cycle time for processing. It looks like this:

It’s looking reasonably well in control. The cycle time hovers around 3 days after 20 weeks which seems pretty good. If you had a process for TMF like this, you’d probably be feeling pretty pleased.

So what happens if the rejection rate is 15% rather than 10%?

Not so good! It’s interesting just how sensitive the system is to the rejection rate. This is clearly not a process in control any more and both inventory and cycle time are heading upwards. After 20 weeks, the average cycle time sits around 10 days.

Having every document go through a QC like this forms a real constraint on the system – a potential bottleneck in terms of the Theory of Constraints. And it’s really easy to turn this potential bottleneck into a real bottleneck. And a bottleneck in a process leads to regular urgent requests, frustration and burn-out. Sound familiar?

In my next post, I’ll take a look at what happens when an audit is announced and the volume of documents to be processed jumps for a couple of weeks.

Picture: CC BY 2.0 Remko Van Dokkum

Deliver Us From Delivery Errors

I returned home recently to find two packages waiting for me. They had been delivered while I was out. One was something I was expecting. The other was not – it was addressed to someone else. And at a completely different address (except the house number). How did that happen I wondered? I called the courier company. After waiting 15 minutes to get through, the representative listened to the problem and was clearly perplexed as the item had been signed for on the system. Eventually he started “Here’s what I can do for you…” and went on to explain how they could pick it up and deliver it to the right address. Problem solved.

Except that. It caused me inconvenience (e.g. a 20 minute call) for which no apology ever came. Their customer did not receive the service they paid for (the package would now be late). The package was put at risk – I could have kept it and no-one would have known. There was no effort at trying to understand how the error was made. They seem to be too busy for this. It has damaged their reputation – I would certainly not use that delivery firm. It was simply seen as a problem to resolve. Not an opportunity to improve.

The next day, a neighbour came round to hand over a mis-delivered parcel. You guessed it, it was the same courier company who had delivered a separate package that was for us to a neighbour. It’s great our neighbour brought it round. But the company will never hear of that error.

So many learnings from this! If the company was customer-focused they would really want to understand how such errors occur (by carrying out root cause analysis). And they would want to learn from the problems rather than just resolving each one individually. They should take a systemic approach. They should also consider that data they hold on the number of errors (mis-deliveries in this case) is incomplete. Helpful people sort mis-deliveries out for them every day without them even knowing. When they review data on the scale of the problem they should be aware that their data is an underestimate. And as for customer service, I can’t believe I didn’t even get a “sorry for the inconvenience” comment. According to a recent UK survey, 20% of people have had a parcel lost during delivery in the last 12 months. This is, after all, a critical error. Any decent company would want to really understand the issue and put systems in place to try to prevent future issues.

To me, this smacks of a culture of cost-cutting and lack of customer focus. Without a culture of continuous improvement, they will lose ground against their competitors. I have dealt with other courier companies and some of them are really on the ball. Let’s hope their management realises they need to change sooner rather than later…