Can AI Outperform Human Doctors?
/By Crystal Lindell
It may not be long before a trip to the emergency room means telling your symptoms to an AI robot, potentially before you even talk to a human doctor.
New research published in Science seems to highlight the potential for artificial intelligence to create such a future in healthcare.
The study -- which was conducted by both Harvard and Stanford researchers – tested OpenAI’s experimental “o1 preview” models against human physicians. OpenAI makes ChatGPT.
They asked the o1 models to do a patient diagnosis and create a diagnostic testing plan, then compared its skill in clinical reasoning to experts and generalist physicians.
They also assessed AI on 76 real-life emergency room patients at a Boston hospital in three stages: the initial triage at first arrival; first contact with a physician; and upon admission to the hospital.
The results showed that the new AI model outperformed human physicians and showed improvement from earlier generations of AI.
“Our findings suggest the urgent need for prospective trials to evaluate these technologies in real-world patient care settings and for health care systems to prepare for investments for computing infrastructure and design for clinician-AI interaction that can facilitate the safe integration of AI tools into patient-care workflows,” wrote lead authors Arjun Manrai, PhD, Assistant Professor of Biomedical Informatics at Harvard University and Adam Rodman, MD, Director of AI Programs at Beth Israel Deaconess Medical Center.
In the emergency department cases, the o1 model was diagnostically correct 67.1% of the time during the initial triage, outperforming two expert attending physicians (55.3% and 50.0%).
Physicians who reviewed the diagnostic results – without knowing if they were made by AI or human doctors – were unable to distinguish between the two.
“AI models are evolving from static question-and-answer tools into agents that can, for example, analyze patient records, monitor clinical encounters through ambient listening, and interact in real time with predictive models built on patient data," Ashley Hopkins, PhD, and Erik Cornelisse, PhD candidate, at the College of Medicine and Public Health at Flinders University in Australia, wrote in an op/ed on the study.
“This advance sets a new evaluation benchmark — testing AI against physician performance, and ideally alongside physicians, on authentic clinical tasks.”
Interestingly, Hopkins and Cornelisse pushed back on the idea that the ideal method for evaluating patients is physicians collaborating with AI. They think AI may perform better on its own.
“That collaborative configuration itself must be tested,” they write. “It has been argued that for certain well-defined tasks across health care, AI may operate more effectively independently.”
They also wrote that since many doctors are already using AI in their practices, sometimes without institutional oversight, further studies are urgently needed to determine when AI improves patient care and when it does not.
In an article about the AI study published in Harvard Magazine, Arjun Manrai, the senior co-author of the study, said the results do not show that “AI replaces doctors, despite what some (AI) companies are likely to say.”
“I think it does mean that we’re witnessing a really profound change in technology that will reshape medicine,” Manrai said. “We need to evaluate this technology now and rigorously conduct prospective clinical trials.”
Manrai also makes an important point. The AI study was based entirely on text-based inputs, while practicing physicians evaluate many other forms of information and communication, such as listening to a patient, observing how a patient behaves, examining images and x-rays, and evaluating other test results.
AI can’t do all those things – at least not yet.
Manrai’s co-author, Adam Rodman, also thinks it’s premature for AI to replace doctors in clinical settings. AI might prove useful in providing second opinions and finding diagnostic mistakes, but Rodman doesn’t want to see “AI doctor companies” replacing human physicians.
“I do not think that these results support that,” Rodman said. “What these results support is a robust and ambitious research agenda to try to figure out how we can use these technologies to make patients’ lives better.”
