A Simple Test Exposes the Limits of AI in Medical Diagnosis

Written by Dr. Bobby Dubois on May 12, 2026. Posted in Current News

I wasn’t actually sick. I invented two symptoms — hand pain and a severe headache — and asked ChatGPT what to do. It was an experiment.

I wanted to know what happens when a real person, with what feels like a real problem, turns to AI for help.

In both cases, I got a list. What I never got was a question. The AI never once asked my age, how the symptom started (my hand pain began after hitting it with a sledgehammer), or whether I’d experienced anything like it before. I had to volunteer all of that myself. When I did, the answers improved considerably. When I didn’t, one response suggested I lie down in a dark room — for what might have been an intracerebral hemorrhage.

AI Knows Medicine. That’s Not the Problem.

The first question to ask is whether AI actually has sufficient medical knowledge to be useful. The evidence strongly suggests that the answer is yes. A 2023 study evaluated ChatGPT across all three parts of the standard physician licensing exam and found it generally passed, scoring between 52 and 75 percent (the passing threshold is about 60 percent). More recent testing of eight current AI models put diagnostic accuracy at nearly 90 percent.

So, the raw knowledge is there. The problem is what happens when that knowledge meets a real person describing a real symptom in incomplete, ambiguous language.

A doctor who aces the boards can still make wrong clinical decisions when a patient sits in front of them with a muddled history. The same is true — perhaps more so — for AI.

When Real People Use AI: The Evidence Is Sobering

A study published in Nature Medicine this year tested exactly this. Researchers created 10 clinical scenarios and had about 1,300 people interact with AI platforms to determine what to do. The result: AI provided the correct triage guidance only about 43 percent of the time. It also performed no better than a standard Google search.

Why the failure? Users provided only partial information. They didn’t know what to include. And the AI didn’t ask. In 16 of 30 sampled interactions, the initial message contained only incomplete details. The AI filled in the gaps — but not reliably.

A second study, published in February 2026, tested ChatGPT Health on 60 clinical vignettes across 21 medical areas. Unlike the actor study, researchers entered complete information — no missing details. Even so, the AI correctly triaged only 35 percent of non-urgent cases and 48 percent of urgent ones. For diabetic ketoacidosis, a condition that requires emergency care regardless of severity, the AI recommended outpatient follow-up.

The AI knew the facts. It failed on the judgment.

There is one more finding worth noting. The vignettes sometimes included what a friend or family member thought about the situation — for example, “my husband thinks it’s probably a muscle strain.” That contextual information influenced what the AI recommended. It down-weighted urgency based on a layperson’s offhand opinion. This is a well-known AI tendency called sycophancy — it tends to agree with what the user seems to believe. A skilled clinician tries to quiet this clinical noise, these biases.

We’re all familiar with cognitive biases — confirmation bias, the halo effect, recency bias — distorting human judgment. It turns out that can exist in a computer as well. AI carries its own versions; in this study, it drifted toward whatever the user seemed to believe rather than reasoning independently from the evidence.

Where AI Actually Helps

None of this means AI has no place in healthcare. There are specific situations where it performs reliably and adds real value.

Post-visit synthesis. A family member of mine recently saw a specialist who delivered more information than he could absorb. He recorded the visit with permission, converted it to a transcript, and asked AI to organize it into a readable summary. It worked well, providing a personal synthesis that the official clinical note rarely provides.

On a related topic, a recent study found that physician-written notes outperformed AI-generated ones on accuracy, thoroughness, and usefulness. If your doctor uses AI for documentation, ask whether they review it carefully. AI’s notes should be treated as drafts, not final versions.

Translation and comprehension. AI is genuinely useful for converting clinical language into plain English. It can help translate lab results, imaging reports, discharge summaries, and visit notes. This is a low-stakes, high-value area where it performs consistently well.

Pre-visit preparation. Using AI to research a diagnosis or generate questions before an appointment is a legitimate and often helpful use.

How to make symptom evaluation better. If you do use AI to evaluate a symptom, there is a simple intervention that seems to meaningfully improve the interaction. Instead of describing your symptoms and waiting for a response, start with this:

“Before you respond, please ask me all the questions you need to give me accurate information about my situation.”

In my own testing, this changed the interaction. The AI stopped offering generic responses and started gathering the kind of information a clinician would need. It seems the knowledge to ask the right questions is there, but it doesn’t use it by default.

Conclusion

AI is genuinely useful for understanding medical information, preparing for appointments, and synthesizing what you’ve learned. It is not yet reliable for determining what’s wrong with you or whether you should go to the emergency room. The clinical knowledge is there. The reasoning, especially under uncertainty and with incomplete information, is not.

About the author: Dr. Bobby Dubois is a physician and scientist with publications on evidence-based medicine, appropriateness of care, and the value of health care interventions. He writes a Substack and podcasts at Live Long and Well With Dr. Bobby.

source www.sensible-med.com

Comments (1)

Tom

May 12, 2026 at 12:04 pm | #

…”AI Knows Medicine. That’s Not the Problem.”…WRONG! The problem is that all A/i knows is failing allopathic medicine. It might know the standard of care but that care results in needing more care, not usually healing. The goal is always to get care that provides healing and be done with it. No more allopathic medicine for life. Thus A/i is not the answer for healing. All it will ever provide is more cradle to grave allopathic nonsense. You can get that from your silly doctor who may not be so mechanical, cold and ugly looking. Well, maybe.

Reply

A Simple Test Exposes the Limits of AI in Medical Diagnosis

Related

Comments (1)

Tom

Leave a comment