I attended the recent ‘Digital Health Re-Wired’ conference at Birmingham’s NEC last week. There was a lot of talk about AI – in fact I think the term pretty much featured on every stand and in every stage presentation at the conference. People are excited about AI and wherever you work in healthcare AI is coming to a clinical information system near you…

At this point I need to declare an interest – I absolutely hate the term Artificial Intelligence – I think it is a totally misleading term. In fact I’m pretty sure that there is no such thing as artificial intelligence – it is a term used to glamorise what are without doubt very sophisticated data processing tools but also to obscure what those tools are doing and to what data. In medical research hiding your methods and data sources is tantamount to a crime…

An Intelligent Definition

So what is artificial intelligence? It refers to a class of technologies that consist of certain types of algorithm paired with very large amounts of data. The algorithms used in AI are variously called machine learning algorithms, adaptive algorithms, neural networks, clustering algorithms, decision trees and many variations and sub-types of the same. Fundamentally however, they are all statistical tools used to analyse and seek out patterns in data – much like the statistical tools we are more familiar with such as linear logistic regression. In fact the underpinning mathematics of a learning algorithm such as a neural network was invented in the 18th century by an English Presbyterian Minister, Philosopher and Mathematician – The Reverend Thomas Bayes. Bayes’ Theorem found a way for a statistical model to update itself and adapt its probabilistic outcomes as it is presented with new data. The original adaptive algorithm – which has ultimately evolved into to today’s machine learning algorithms – which are given their power by being hosted on very powerful computers and being fed very very large amounts of data.

The other ingredient that has given modern machine learning tools their compelling illusion of ‘intelligence’ is the development of a technology called large language models (LLMs). These models are able to present the outputs of the statistical learning tools in natural flowing human readable (or listenable) narrative language – i.e. they write and talk like a human. Chat-GPT being the most celebrated example. I wrote about them about 5 years ago (The Story of Digital Medicine) – at which point they were an emerging technology but have since become mainstream and extremely effective and powerful.

Danger Ahead!

Here lies the risk in the hype – and the root cause of some of the anxiety about AI articulated in the press. Just because something talks a good talk and can spin a compelling narrative – doesn’t mean it is telling the truth. In fact quite often Chat-GPT will produce a well crafted beautifully constructed narrative that is complete nonsense. We shouldn’t be surprised by this really – because the source of Chat-GPT’s ‘knowledge’ is ‘The Internet’ – and we all have learned that just because its on the internet doesn’t mean its true. Most of us have learnt to be somewhat sceptical and a bit choosy over what we believe when we do a Google search – we’ve learnt to sift out the ads, not necessarily pick out the first thing that Google gives us and also to examine the sources and their credentials. Fortunately Google is able to give us quite a lot of the contextual information around the outputs of its searches that enables us to be choosy. Chat-GPT on the other hand hides its sources behind a slick and compelling human understandable narrative – a bit like a politician.

The Power of Data

In 2011 Peter Sondergaard – senior vice president at Gartner, a global technology research and consulting company – declared “data eats algorithms for breakfast”. This was in response to the observation that a disproportionate amount of research effort and spending was being directed at refining complex machine learning algorithms yielding only marginal gains in performance compared to the leaps in performance achieved by feeding the same algorithms more and better quality data. See ‘The Unreasonable Effectiveness of Data

I have experienced the data effect myself – back in 1998/99 I was a research fellow in the Birmingham School of Anaesthesia and also the proud owner of an Apple PowerBook Laptop with (what was then novel) a connection to the burgeoning internet. I came across a piece of software that allowed me to build a simple 4 layer neural network – I decided to experiment with it to see if it was capable of predicting outcomes from coronary bypass surgery using only data available pre-operatively. I had access to a dataset of 800 patients of which the majority had had uncomplicated surgery and a ‘good’ outcome and a couple of dozen had had a ‘bad’ outcome experiencing disabling complications (such as stroke or renal failure) or had died. I randomly split the dataset into a ‘training set’ of 700 patients and a ‘testing set’ of 100. Using the training set I ‘trained’ the neural network – giving it all the pre-op data I had on the patients and then telling it if the patients had a good or a bad outcome. I then tested what the neural network had ‘learned’ with the remaining 100 patients. The results were ok – I was quite pleased but not stunned, the predictive algorithm had an area under the ROC curve of about 0.7 – better than a coin toss but only just. I never published, partly because the software I used was unlicensed, free and unattributable but mainly because at the same time a research group from MIT in Boston published a paper doing more or less exactly what I had done but with a dataset of 40,000 patients – their ROC area was something like 0.84, almost useful and a result I couldn’t come close to competing with.

Using AI Intelligently

So what does this tell us? As practicing clinicians, if you haven’t already, you are very likely in the near future to be approached by a tech company selling an ‘AI’ solution for your area of practice. There are some probing questions you should be asking before adopting such a solution and they are remarkably similar to the questions you would ask of any research output or drug company that is recommending you change practice:

  1. What is the purpose of the tool?
    • Predicting an outcome
    • Classifying a condition
    • Recommending actions
  2. What type of algorithm is being used to process the data?
    • Supervised / Unsupervised
    • Classification / Logistic regression
    • Decision Tree / Random Forrest
    • Clustering
  3. Is the model fixed or dynamic? i.e. has it been trained and calibrated using training and testing datasets and is now fixed or will it continue to learn with the data that you provide to it?
  4. What were the learning criteria used in training? i.e. against what standard was it trained?
  5. What was the training methodology? Value based, policy based or model based? What was the reward / reinforcement method?
  6. What was the nature of the data it was trained with? Was it an organised labeled dataset or disorganised unlabelled?
  7. How was the training dataset generated? How clean is the data? Is it representative? How have structural biases been accounted for (Age, Gender, Ethnicity, Disability, Neurodiversity)?
  8. How has the model been tested? On what population, in how many settings? How have they avoided cross contamination of the testing and training data sets?
  9. How good was the model in real world testing? How sensitive? How specific?
  10. How have they detected and managed anomalous outcomes – false positives / false negatives?
  11. How do you report anomalous outcomes once the tool is in use?
  12. What will the tool do with data that you put into it? Where is it stored? Where is it processed? Who has access to it once it is submitted to the tool? Who is the data controller? Are they GDPR and Caldecott compliant?

Getting the answers to these questions are an essential pre-requisite to deploying these tools into clinical practice. If you are told that the answers cannot be divulged for reasons of commercial sensitivity – or the person selling it to you just doesn’t know the answer then politely decline and walk away. The danger we face is being seduced into adopting tools which are ‘black box’ decision making systems – it is incumbent on us to understand why they make the decisions they do, how much we should trust them and how we can contribute to making them better and safer tools for our patients.

An Intelligent Future

To be clear I am very excited about what this technology will offer us as a profession and our patients. It promises to democratise medical knowledge and put the power of that knowledge into the hands of our patients empowering them to self care and advocate for themselves within the machinery of modern healthcare. It will profoundly change the role we play in the delivery of medical care to patients – undermine the current medical model which relies on the knowledge hierarchy between technocrat doctor and submissive patient – and turn that relationship into the partnership it should be. For that to happen we must grasp these tools – understand them, use them intelligently – because if we don’t they will consume us and render us obsolete.

I have read two stories this week.

The first was written in an interesting, contemporary literary style – you know the sort – short sparse sentences almost factual, leaving lots of ‘space’ for your own imaginative inference, not making explicit links between facts and events but leaving you to do that for yourself.  It was a love story, rather charming and quite short, describing a familiar narrative of boy meets girl, invites her to the cinema and they fall in love (probably).  It could be described as Chandleresque in style – though it isn’t that good – in fact it could have been written by an 11+ student.  It wasn’t though – it was in fact written by a computer using a form of artificial intelligence called natural language generation with genuinely no human input.  You can read how it was done here.

The second story I read is a description of a falling out of love – of the medical profession with the IT industry and the electronic patient record.  This one is very well written by Robert Wachter and is a warts and all recounting of the story of the somewhat faltering start of the digital revolution in healthcare.  It is called ‘The Digital Doctor’ and I would highly reccomend you read it if you have any interest in the future of medicine.  It is not the manifesto of a starry eyed digital optimist, nor is it the rantings of a frustrated digital skeptic – he manages to artfully balance both world views with a studied and comprehensive analysis of the state of modern health IT systems.  His realism though extends to understanding and articulating the trajectory of the health IT narrative and where it is taking us – which is a radically different way of delivering medical care.  I won’t use this blog to precis his book – its probably better if you go and read it yourself.

From Data to Information to Understanding

The falling out that Dr Wachter describes really is quite dramatic – this is the United States the most advanced healthcare system in the world – yet there are hospitals in the US that advertise their lack of an EPR as a selling point to attract high quality doctors to work for them.  Where has it gone wrong?  Why is the instant availabilty not only of comprehensive and detailed information about our patients but also a myriad of decision support systems designed to make our jobs easier and safer to carry out – not setting us alight with enthusiasm?  In fact it is overwhelming us and oppressing us  – turning history taking into a data collection chore and treatment decisions into a series of nag screens.

The problem is there is just too much information.  The healthcare industry is a prolific producer of information – an average patient over the age of 65 with one or more long term conditions will see their GP (or one of her partners) 3 – 4 times a year, have a similar number of outpatient visits with at least 2 different specialists and attend A&E at least once.  That doesn’t include the lab tests, x-rays, visits to the pharmacy, nursing and therapy episodes.  Each contact with the system will generate notes, letters, results, reports, images, charts and forms – it all goes in to the record – which, if it is a well organised integrated electronic record, will be available in its entirety at the point of care.

Point of care being the point – most health care episodes are conducted over a very short time span.  A patient visiting his GP will, if he’s lucky, get 10 minutes with her – it doesn’t make for a very satisfactory consultation if 4 or 5 of those minutes are spent with the doctor staring at a screen – navigating through pages of data attempting to stich together a meaningful interpretation of the myriad past and recent events in the patient’s medical history.

How it used to be (in the good old days)

So what is it that the above mentioned hospitals in the US are harking back to in order to attract their doctors?  What is the appeal of how it used to be done when a consultation consisted of a doctor, a patient and a few scrappy bits of paper in a cardboard folder?  Well for a start at least the patient got the full 10 minutes of the doctors attention.  The doctor however was relying on what information though?  What the patient tells them, what the last doctor to see them chose to write in the notes, and the other events that might have made it into their particular version of this patient’s health record.  This gives rise to what I call a ‘goldfish’ consultation (limited view of the whole picture, very short memory, starting from scratch each time).  We get away with it most of the time – mainly because most consultations concern realtively short term issues – but too often we don’t get away with it and patients experience a merry go round of disconnected episodes of reactive care.

IMG_0477

As a practitioner of intensive care medicine one of the things that occupies quite a lot of my time as ‘consultant on duty for ICU’ is the ward referral.  As gatekeeper of the precious resource that is an intensive care bed my role is to go and assess a patient for their suitability for ICU care as well as advise on appropriate measures that could be used to avert the need for ICU.  My first port of call is the patient’s notes – where I go through the entire patient’s hospital stay – for some, particularly medical patients, this might be many days or even weeks of inpatient care.  What I invariably find is that the patient has been under the care of several different teams, the notes consist of a series of ‘contacts’ (ward rounds, referrals, escalations) few of which relate to each other (lots of goldfish medicine even over the course of a single admission).  I have ceased to be surprised by the fact that I, at the point of escalation to critical care, am the first person to actually review the entire narrative of the patient’s stay in hospital.  Once that narrative is put together very often the trajectory of a patient’s illness becomes self evident – and the question of whether they would benefit from a period of brutal, invasive, intensive medicine usually answers itself.

Patient Stories

The defence against goldfish medicine in the ‘old days’ was physician continuity – back then you could  expect to be treated most of your life by the same GP, or when you came into hospital by one consultant and his ‘firm’ (the small team of doctors that worked just for him – for in the good old days it was almost invariably a him) for the whole admission.  They would carry your story – every now and then summarising it in a clerking or a well crafted letter.  But physician continuity has gone – and it isn’t likely ever to come back.

The EPR promised to solve the continuity problem by ensuring that even if you had never met the patient in front of you before (nor were likely ever to meet them again) you at least had instant access to everything that had ever happend to them – including the results of every test they had ever had.  But it doesn’t work – data has no meaning until it is turned into a story – and the more data you have the harder it is and longer it takes to turn it into a story.

And stories matter in medicine – they matter to patients and their relatives who use them to understand the random injustice of disease, it tells them where they have come from and where they are going to.  They matter to doctors as well – medical narratives are complex things, they are played out in individual patients over different timescales – from a life span to just a few minutes, each narrative having implications for the other.  Whilst we don’t neccessarily think of it as such – it is precisly the complex interplay between chronic and acute disease, social and psychological context, genetics and pathology that we narrate when summarising a case history.  When it is done well it can be a joy to read – and of course it creates the opportunity for sudden moment when you get the diagnostic insight that changes the course of a paient’s treatment.

Natural Language Generation

Turning the undifferentiated information that is a patients medical record – whether paper or digital – into a meaningful story has always been a doctor’s task.  What has changed is the amount of information available for the source material, and the way it is presented.  A good story always benefits from good editing – leaving out the superfluous, the immaterial or irrelevant detail is an expert task and one that requires experience and intelligence.  You see it when comparing the admission record taken by a foundation year doctor compared to an experienced registrar or consultant – the former will be a verbatim record of an exchange between doctor and patient, the latter a concise inquisition that hones in on the diagnosis through a series of precise, intelligent questions.

So is the AI technology that is able to spontaneously generate a love story sufficiently mature to be turned to the task of intelligently summarising the electronic patient record into a meaningful narrative? Its certainly been used to that effect in a number of other information tasks – weather forecasts and financial reports are now routinely published that were drafted using NLG technology.  The answer of course is maybe – there have been some brave attempts – but I don’t think we are there yet.  What I do know is that the progress of AI technology is moving apace and it won’t be very long before the NLG applied to a comprehensive EPR will be doing a better job than your average foundation year doctor at telling the patient’s story – maybe then we will fall back in love with EPR? Maybe…