How well did ChatGPT Health triage emergencies?

Question

Hans Steiner · Accepted Answer

Key findings from recent testing

A structured evaluation found the health‑focused version of the large language model frequently underestimated how serious some emergency scenarios were, under‑triaging roughly half of the cases in the test. That means the tool routinely recommended a lower level of care than clinicians would in many scenarios that may require urgent or emergency attention.

What that implies for patients and clinicians - Under‑triage can delay necessary treatment, increasing risk to individuals with rapidly progressing conditions.
- Tools that give overly reassuring advice may discourage users from seeking in‑person care when it is needed.
- AI chatbots can still be useful for basic information and signposting, but their limitations must be made explicit to users and integrated with clinician oversight.

Why the results matter and next steps This evaluation underscores that consumer‑facing AI is not yet ready to replace human triage. Regulators, health systems and developers should focus on:

Transparent performance metrics so clinicians and patients understand strengths and limits.
Clear safeguards that steer users toward urgent care when red‑flag symptoms appear.
Ongoing clinical validation across diverse scenarios and populations.

Until those measures are in place, health systems and public‑health agencies should treat such chatbots as an adjunct rather than a substitute for professional triage, and patients should be advised to seek immediate medical help for concerning symptoms.