Do AI health chatbots miss medical emergencies?

Question

Hans Steiner · Accepted Answer

New studies raise safety concerns about triage accuracy

Recent clinical evaluations show that health‑focused large language models can underestimate the urgency of many medical presentations. In controlled tests, a health‑trained chatbot failed to flag a substantial share of cases that clinicians would consider emergencies, under‑triaging scenarios ranging from chest pain patterns to signs of severe infection. Researchers warn that these systems may not reliably recognise red‑flag symptoms or the need for immediate in‑person care.

Why these tools struggle

They are trained on broad text data and may not prioritise clinical nuance the way experienced clinicians do.
Presentations that depend on subtle combinations of symptoms, context or vital‑sign thresholds can confuse model‑based triage.
Some chatbots lack up‑to‑date clinical protocols or the ability to integrate real‑time measurements such as heart rate or oxygen saturation.

Practical guidance for patients and providers

Treat chatbots as informational aids, not replacements for clinical judgment or emergency services.
If symptoms are severe, rapidly worsening, or include signs such as difficulty breathing, chest pain, sudden weakness, altered consciousness, or severe bleeding, seek emergency care immediately.
Clinicians should be cautious about relying on AI triage outputs and advocate for systems that transparently document limitations and error rates.

Researchers and regulators are calling for more rigorous clinical testing, clearer safety standards and better integration of objective clinical data before these systems are used for autonomous triage. For now, the safest approach is to use them as one piece of information while defaulting to established emergency pathways when risk is uncertain.