Top AI models fail spectacularly when faced with slightly altered medical questions

TLDR

AI models excel on medical exams but struggle with slightly altered questions, suggesting they may not truly understand medical content, according to a study in JAMA Network Open.

AI's high scores on medical exams may be misleading. A new study shows that when the correct answer is replaced with "None of the other answers," model accuracy plummets. This suggests a reliance on pattern recognition over genuine clinical reasoning.