Big Data – Garbage in, garbage out?

Change of plan for this post…I visited the dentist recently. And before the consultation, I was handed an ipad with a form to complete. I was sure I had completed this form before last time – and checking with the receptionist she said it had to be completed every six months. So I had completed it before. It was a long form asking all sorts of details about medical history, medicines being taken etc. It included questions about lifestyle – how much exercise you get, whether you smoke, how much alcohol you drink etc. It all seemed rather over the top to be completing every six months. It seemed such an inefficient process and prone to error. Every patient completing all these detailed questions (often in a rush). And no way to check what my previous answers were – wouldn’t it be nice if they just pre-filled my previous answers and I could make any adjustments. All a little frustrating really. So I asked the receptionist why all this was needed.

“The government needs it,” was the reply. Really? What on earth do they do with it all, I wondered? I have to admit, that answer made me try a little experiment. I tried to see if the form would submit without me entering anything. It didn’t – it told me I had to sign the form first. So I signed it and sure enough it was accepted. So I handed the ipad back to the receptionist and she thanked me for being so quick. Off I went to my appointment and all was fine. And I felt as though I had struck a very small blow for freedom.

I wonder what does happen to all the data. Does it really go to “the government”? What would they do with it? Is it a case of gathering big data that can then be mined for trends – how the various factors affect dental health maybe? Well, one thing’s for sure, I wouldn’t trust the conclusions given how easy it seems to be to dupe the system. What guarantee is there on the accuracy of any of the data? Seems to me a case of garbage in, garbage out.

As we are all wowed by what Big Data can do and the incredible neural networks and algorithms teams can develop to help us (see previous blog), we do need to think about the source of the Big Data. Where has it come from? Could it be biased (almost certainly)? And in what way? How can we guard against the impact of that bias? There’s been a lot in the news recently about the dangers of bias – for example in Time and the Guardian. If we’re not careful, we can build bias into the algorithms and just continue with the discrimination we already have. Our best defence is scepticism. Just as when, in root cause analysis, an expert is quoted for evidence. As Edward Hodnett says: “Be sceptical of assertions of fact that start, ‘J. Irving Allerdyce, the tax expert, says…’ There are at least ten ways in which these facts may not be valid. (1) Allerdyce may not have made the statement at all. (2) He may have made an error. (3) He may be misquoted. (4) He may have been quoted only in part….”

Being sceptical and asking questions can help us avoid erroneous conclusions. Ask questions like: “how do you know that?”, “do we have evidence for that?” and “could there be bias here?”

Big Data has huge potential. But let’s not be wowed by it so that we don’t question. Be sceptical. Remember, it could be another case of garbage in, garbage out.

Image: Pixabay

Text: © 2017 Dorricott MPI Ltd. All rights reserved.