UCSF-led study finds artificial intelligence is as good as a physician at prioritizing which patients need to be seen first.
Emergency departments nationwide are overcrowded and overtaxed, but a new study suggests artificial intelligence (AI) could one day help prioritize which patients need treatment most urgently.
Using anonymized records of 251,000 adult emergency department (ED) visits, researchers at UC San Francisco evaluated how well an AI model was able to extract symptoms from patients’ clinical notes to determine their need to be treated immediately. They then compared the AI analysis with the patients’ scores on the Emergency Severity Index, a 1-5 scale that ED nurses use when patients arrive to allocate care and resources by highest need, a process known as triage.
The patients’ data were separated from their actual identities (de-identified) for the study, which publishes May 7, 2024, in JAMA Network Open. The researchers evaluated the data using the ChatGPT-4 large language model (LLM), accessing it via UCSF’s secure generative AI platform, which has broad privacy protections.
The researchers tested the LLM’s performance with a sample of 10,000 matched pairs — 20,000 patients in total — that included one patient with a serious condition, such as stroke, and another with a less urgent condition, such as a broken wrist. Given only the patients’ symptoms, the AI was able to identify which ED patient in the pair had a more serious condition 89% of the time.
In a sub-sample of 500 pairs that were evaluated by a physician as well as the LLM, the AI was correct 88% of the time, compared to 86% for the physician.
Having AI assist in the triage process could free up critical physician time to treat patients with the most serious conditions, while offering backup decision-making tools for clinicians who are juggling multiple urgent requests.
“Imagine two patients who need to be transported to the hospital but there is only one ambulance. Or a physician is on call and there are three people paging her at the same time, and she has to determine who to respond to first,” said lead author Christopher Williams, MB, BChir, a UCSF postdoctoral scholar at the Bakar Computational Health Sciences Institute.
Not quite ready for prime time
The study is one of only a few to evaluate an LLM using real-world clinical data, rather than simulated scenarios, and is the first to use more than 1,000 clinical cases for this purpose. It’s also the first study to use data from visits to the emergency department, where there is a wide array of possible medical conditions.
Despite its success within this study, Williams cautioned that AI is not ready to use responsibly in the ED without further validation and clinical trials.
“It’s great to show that AI can do cool stuff, but it’s most important to consider who is being helped and who is being hindered by this technology,” said Williams. “Is just being able to do something the bar for using AI, or is it being able to do something well, for all types of patients?”
One important issue to untangle is how to eliminate bias from the model. Previous research has shown these models may perpetuate racial and gender biases in health care, due to biases within the data used to train them. Williams said that before these models can be used, they will need to be modified to strip out that bias.
“First we need to know if it works and understand how it works, and then be careful and deliberate in how it is applied,” Williams said. “Upcoming work will address how best to deploy this technology in a clinical setting.”