What No-one Tells You About Root Cause Analysis

When something significant goes wrong, we all know that getting to the root cause is an important start to understanding and helping to prevent the same issue recurring. I’ve talked many times in this blog about methods of root cause analysis and, of course, I recommend DIGR-ACT®. But there are other methods too. The assumption with all these methods is that you can actually get to the root cause(s).

I was running process and risk training for the Institute of Clinical Research recently. The training includes root cause analysis. And one of the trainees gave an example of a Principal Investigator (PI) who had randomized a patient, received the randomization number and proceeded to pick the wrong medication lot for the patient. She should have selected the medication lot that matched the randomization number but picked the wrong one. This was discovered later in the trial when Investigational Product accountability was carried out by the CRA visiting the site. By this time, of course, the patient had potentially been put at risk and the results could not be included in the analysis. So why had this happened? It definitely seemed to be human error. But why had that error occurred?

The PI was experienced in clinical trials. She knew what to do. This error had not occurred before. There was no indication that she was particularly rushed or under pressure on that day. The number was clear and in large type. How was it possible to mis-read the number? The PI simply said she made a mistake. And mistakes happen. That’s true, of course, but would we accept that of an aeroplane pilot? We’d still want to understand how it happened. Human error is not a root cause. But if human error isn’t the root cause, what is?

Sometimes, we just don’t know. Root cause analysis relies on cause and effect. If we don’t understand the cause and effect relationships, we will not be able to get to true root causes. But that doesn’t mean we just hold up our hands and hope it doesn’t happen again. That would never pass in the airline industry. So what should we do in this case?

It’s worth trying to see, first, how widespread a problem this is. Has it happened before at other sites? On other studies? What are the differences between sites / studies where this has and had not happened? This may still not be enough to lead you to root cause(s). If not, then maybe we could modify the process to make it less likely to recur? Could we add a QC step such as having the PI write the number of the medication down next to the randomization number – this should highlight a difference if there is one. Or perhaps they could enter the number into a system so that it can check, Or maybe there has to be someone else at the site that checks at the point of dispensing.

A secret in root cause analysis that is rarely mentioned is that sometimes you can’t get to the root cause(s). There are occasions when you simply don’t have enough information to be able to get there. In these cases, whatever method you use, you cannot establish the root cause(s). Of course, if you do, it will help in determining effective actions to help stop recurrence. But without establishing root cause(s), there are still actions you can take to try to reduce the likelihood of recurrence.

 

Text: © 2020 Dorricott MPI Ltd. All rights reserved.

Oh No – Not Another Audit!

It has always intrigued me, this fear of the auditor. Note that I am separating out auditor from (regulatory) inspector here. Our industry has had an over reliance on auditing for quality rather than on building our processes to ensure quality right the first time. The Quality Management section of ICH E6 (R2) is a much needed change in approach. And this has been enhanced by the ICH E8 (R1) (draft) “Quality should rely on good design and its execution rather than overreliance on retrospective document checking, monitoring, auditing or inspection”. The fear of the auditor has led to some very odd approaches.

Trial Master File (TMF) is a case in point. I seem to have become involved with TMF issues and improving TMF processes a number of times in CROs and more recently have helped facilitate the Metrics Champion Consortium TMF Metrics Work Group. The idea of an inspection ready TMF at all times comes around fairly often. But to me, that misses the point. An inspection ready (or audit ready) TMF is a by-product of the TMF processes working well – not an aim in itself. We should be asking – what is the TMF for? The TMF is to help in the running of the trial (as well as to document it to be able to demonstrate processes, GCP etc were followed). It should not be an archive gathering dust until an audit or inspection is announced when a mad panic ensues to make sure the TMF is inspection ready. It should be being used all the time – a fundamental source of information for the study team. Used this way, gaps, misfiles etc will be noticed and corrected on an ongoing basis. If the TMF is being used correctly, there shouldn’t be significant audit findings. Of course, process and monitoring (via metrics) need to be set up around this to make sure it works. This is process thinking.

And then there are those processes that I expect we have all come across. No-one quite understands why there are so many convoluted steps. Then you discover that at some point in the past there was an audit and to close the audit finding (or CAPA), additional steps were added. No-one knows the point of the additional steps any more but they are sure they must be needed. One example I have seen was of a large quantity of documents being photo-copied prior to sending to another department. This was done because documents had got lost on one occasion and an audit had discovered this. So now someone spent 20% of their day photocopying documents in case they got lost in transit. Not a good use of time and not good for the environment. Better to redesign the process and then consider the risk. How often do documents get lost en route? Why? What is the consequence? Are some more critical than others? Etc. Adding the additional step to the process due to an audit finding was the easiest thing to do (like adding a QC step). But it was the least efficient response.

I wonder if part of the issue is that some auditors appear to push their own solution too hard. The process owner is the person that understand the process best. It is their responsibility to demonstrate they understand the audit findings, to challenge where necessary, and to argue for the actions they think will address the real issues. They should focus on the ‘why’ of the process.

Audit findings can be used to guide you in improving the process to take out risk and make it more efficient. Root cause analysis, of course, can help you with the why for particular parts of the process. And again, understanding the why helps you to determine much better actions to help prevent recurrence of issues.

Audits take time, and we would rather be focusing on the real work. But they also provide a valuable perspective from outside our organisation. We should welcome audits and use the input provided by people who are neutral to our processes to help us think, understand the why and make improvements in quality and efficiency. Let’s welcome the auditor!

 

Image: Pixabay

Text: © 2019 Dorricott MPI Ltd. All rights reserved.

No Blame – Why is it so Difficult?

I have written before about the importance of removing blame when trying to get to the root causes of an issue. To quote W Edwards Deming, “No one can put in his [/her] best performance unless he [/she] feels secure. … Secure means without fear, not afraid to express ideas, not afraid to ask questions. Fear takes on many faces.” But why is it so difficult to achieve? You can start a root cause analysis session by telling people that it’s not about blame but there’s more to it than telling people.

It’s in the culture of an organization – which is not easy to change. But you can encourage “no blame” by your questioning technique and approach too. If significant issues at an investigative site have been uncovered during an audit, the easiest thing might be to “blame” the CRA. Why didn’t he/she find the problems and deal with them earlier? What were they doing? Why didn’t they do it right? If I was the CRA and this appeared to be the approach to get to root cause, I certainly would be defensive. Yes, I got it wrong and I need to do better next time. Please don’t sack me! I would be fearful. Would it really help to get to the root causes?

Would it be better to start by saying that QC is not 100% effective – we all miss things. What actually happens before, during and after a monitoring visit to this site? Are the staff cooperative? Do they follow-up quickly with questions and concerns? And the key question – “What could be done differently to help make it more likely that these issues would have been detected and dealt with sooner?” This is really getting at the Gilbert’s Behavior Engineering Model categories. Are site staff and CRA given regular feedback? Are the tools and resources there to perform well? Do people have the right knowledge and skills?

This is where you’re likely to start making progress. Perhaps the site has not run a clinical trial before, they are research-naïve. We haven’t recognised this as a high risk site and are using our standard monitoring approach. The CRA has limited experience. There’s been no co-monitoring visit and no-one’s been reviewing the Monitoring Visit Reports – because there’s a lack of resources due to high CRA turnover and higher than expected patient enrollment. And so on and so on…To quote W. Edwards Deming again, “Nobody goes to work to do a bad job.”

Don’t just tell people it’s not about blame. Show that you mean it by the questions you ask.

 

Want to find more about effective root cause analysis in clinical trials? Visit www.digract.com today.

 

Text: © 2019 DMPI Ltd. All rights reserved.

What My Model of eTMF Processing Taught Me (Part II)

In a previous post, I described a model I built for 100% QC of documents as part of an eTMF process. We took a look at the impact of the rejection rate for documents jumping from 10% to 15%. It was not good! So, what happens when an audit is announced and suddenly the number of documents submitted doubles? In the graph below, weeks 5 and 6 had double the number of documents. Look what it does to the inventory and cycle time:

The cycle time has shot up to around 21 days after 20 weeks. The additional documents have simply added to the backlog and that increases the cycle time because we are using First In, First Out.

So what do we learn overall from the model? In a system like this, with 100% QC, it is very easy to turn a potential bottleneck into an actual bottleneck. And when that happens, the inventory and cycle time will quickly shoot upwards unless additional resource is added (e.g. overtime). But, you might ask, do we really care about cycle time? We definitely should: if the study team can’t access documents until they have gone through the QC, those documents are now not available for 21 days on average. That’s not going to encourage every day use of the TMF to review documents (as the regulators expect). And might members of the study team send in duplicates because they can’t see the documents that are awaiting processing? Adding further documents and impacting inventory and cycle time still further. And this is not a worst case scenario as I’m only modelling one TMF here – typically a Central Files group will be managing many TMFs and may be prioritizing one over another (i.e. not First In, First Out). This spreads out the distribution of cycle times and will lead to many more documents that are severely delayed through processing.

“But we need 100% QC of documents because the TMF is important!” I hear you shout. But do you really? As the great W Edwards Deming said, “Inspection is too late. The quality, good or bad, is already in the product.” Let’s get quality built in in the first place. You should start by looking at that 15% rejection rate. What on earth is going on to get a rejection rate like that? What are those rejections? Are those carrying out the QC doing so consistently? Do those uploading documents know the criteria? Is there anyone uploading documents who gets it right every time? If so, what is it that they do differently to others?

What if you could get the rejection rate down to less than 1%? At what point would you be comfortable taking a risk-based approach – that assumes those uploading documents do it right the first time? And carrying out a random QC to look for systemic issues that could then be tackled? How much more efficient this would be. See the diagram in this post. And you’d remove that self-imposed bottleneck. You’d get documents in much quicker, costing less and with improved quality. ICH E6 (R2) is asking us to consider quality as not being 100% but concerning ourselves with errors that matter. Are we brave enough as an industry to apply this to the TMF?

 

Text: © 2019 DMPI Ltd. All rights reserved.

Picture: CC BY 2.0 Remko Van Dokkum

What My Model of eTMF Processing Taught Me

On a recent long-haul flight, I got to thinking about the processing of TMF documents. Many organisations and eTMF systems seem to approach TMF documents with the idea that every one must be checked by someone other than the document owner. Sometimes, the document owner doesn’t even upload their own documents but provides them, along with metadata, to someone else to upload and index. And then their work is checked. There are an awful lot of documents in the TMF and going through multiple steps of QC (or inspection as W Edwards Deming would call it) seems rather inefficient – see my previous posts. But we are a risk-averse industry – even having been given the guidance to used risk-based approaches in ICH E6 (R2) and so many organizations seem to use this approach.

So what is the implication of 100% QC? I decided I would model it via an Excel spreadsheet. My assumptions are that there are 1000 documents submitted per week. Each document requires one round of QC. The staff in Central Files can process up to 1100 documents per week. I’ve included a random +/-5% to these numbers for each week (real variation is much greater than this I realise). I assume 10% of documents are rejected at QC. And that when rejected, the updated documents are processed the next week. I’ve assumed First In, First Out for processing. My model looks at the inventory at the end of each week and the average cycle time for processing. It looks like this:

It’s looking reasonably well in control. The cycle time hovers around 3 days after 20 weeks which seems pretty good. If you had a process for TMF like this, you’d probably be feeling pretty pleased.

So what happens if the rejection rate is 15% rather than 10%?

Not so good! It’s interesting just how sensitive the system is to the rejection rate. This is clearly not a process in control any more and both inventory and cycle time are heading upwards. After 20 weeks, the average cycle time sits around 10 days.

Having every document go through a QC like this forms a real constraint on the system – a potential bottleneck in terms of the Theory of Constraints. And it’s really easy to turn this potential bottleneck into a real bottleneck. And a bottleneck in a process leads to regular urgent requests, frustration and burn-out. Sound familiar?

In my next post, I’ll take a look at what happens when an audit is announced and the volume of documents to be processed jumps for a couple of weeks.

 

Text: © 2019 DMPI Ltd. All rights reserved.

Picture: CC BY 2.0 Remko Van Dokkum

Is more QC ever the right answer? Part II

In part I of this post, I described how some processes have been developed that they can end up being the worst of all worlds by adding a QC step – they take longer, cost more and give quality the same (or worse) than a one step process. So why would anyone implement a process like this? Because “two sets of eyes are better than one!”

What might a learning approach with better quality and improved efficiency look like? I would suggest this:

In this process, we have a QC role and the person performing that role takes a risk-based approach to sampling the work and works together with the Specialist to improve the process by revising definitions, training etc. The sampling might be 100% for a Specialist who has not carried out the task previously. But would then reduce down to low levels as the Specialist demonstrates competence. The Specialist is now accountable for their work – all outputs come from them. If a high level of errors is found then an escalation process is needed to contain the issue and get to root cause (see previous posts). You would also want to gather data about the typical errors seen during the QC role and plot them (Pareto charts are ideal for this) to help focus on where to develop the process further.

This may remind you of the move away from 100% Source Document Verification (SDV) at sites. The challenge with a change like this is that the process is not as simple – it requires more “thinking”. What do you do if you find a certain level of errors? This is where the reviewer (or the CRA in the case of SDV) need a different approach. It can be a challenge to implement properly. But it should actually make the job more interesting.

So, back to the original question: Is more QC ever the answer? Sometimes – But make sure you think through the consequences and look for other options first.

In my next post, I’ll talk about a problem I come across again and again. People don’t seem to have enough time to think! How can you carry out effective root cause analysis or improve processes without the time to think?

Text: © 2018 Dorricott MPI Ltd. All rights reserved.

Is More QC Ever the Right Answer? Part I

In a previous post, I discussed whether retraining is ever a good answer to an issue. Short answer – NO! So what about that other common one of adding more QC?

An easy corrective action to put in place is to add more QC. Get someone else to check. In reality, this is often a band-aid because you haven’t got to the root cause and are not able to tackle it directly. So you’re relying on catching errors rather than stopping them from happening in the first place. You’re not trying for “right first time” or “quality by design”.

“Two sets of eyes are better than one!” is the common defence of multiple layers of QC. After all, if someone misses an error, someone else might find it. Sounds plausible. And it does make sense for processes that occur infrequently and have unique outputs (like a Clinical Study Report). But for processes that repeat rapidly this approach becomes highly inefficient and ineffective. Consider a process like that below:

Specialist I carries out work in the process – perhaps entering metadata in relation to a scanned document (investigator, country, document type etc). They check their work and modify it if they see errors. Then they pass it on to Specialist II who checks it and modifies it if they see any errors. Then the reviewer passes it on to the next step. Two sets of eyes. What are the problems with this approach?

  1. It takes a long time. The two steps have to be carried out in series i.e. Specialist II can’t QC the same item at the same time as Specialist I. Everything goes through two steps and a backlog forms between the Specialists. This means it takes much longer to get to the output.
  2. It is expensive. A whole process develops around managing the workflow with some items fast-tracked due to impending audit. It takes the time of two people (plus management) to carry out the task. More resources means more money.
  3. The quality is not improved. This may seem odd but if we think it through. There is no feedback loop in the process for Specialist I to learn from any errors that escape to Specialist II so Specialist I continues to let those errors pass. And the reviewer will also make errors – in fact the rework they do might actually add more errors. They may not agree on what is an error. This is not a learning process. And what if the process is under stress due to lack of resources and tight timelines? With people rushing, do they check properly? Specialist I knows That Specialist II will pick up any errors so doesn’t check thoroughly. And Specialist II knows that Specialist I always checks their work so doesn’t check thoroughly. And so more errors come out than Specialist II had not been there at all. Having everything go through a second QC as part of the process takes away accountability from the primary worker (Specialist I).

So let’s recap. A process like this takes longer, costs more and gives quality the same (or worse) than a one step process. So why would anyone implement a process like this? Because “two sets of eyes are better than one!”

What might a learning approach with better quality and improved efficiency look like? I will propose an approach in my next post. As a hint, it’s risk-based!

Text: © 2018 Dorricott MPI Ltd. All rights reserved.