利用多模态推理推进对话式诊断人工智能

Advancing conversational diagnostic AI with multimodal reasoning

作者信息Khaled Saab, Chunjong Park, Tim Strother, Jan Freyberg, David G T Barrett, Yong Cheng, Wei-Hung Weng, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias, Avinatan Hassidim, Dale R Webster, Joëlle Barral, S M Ali Eslami, Pushmeet Kohli, Adam Rodman, Vivek Natarajan, Mike Schaekermann, Tao Tu, Alan Karthikesalingam, Ryutaro Tanno

PMID42135531

期刊Nat Med

发布时间2026-05

DOI10.1038/s41591-026-04371-0

来源查看原文

摘要

Real-world clinical practice is inherently multimodal, relying on the synthesis of patient history with visual information such as medical imagery and clinical documents. Although large language models (LLMs) have shown promise in diagnostic dialogue, their evaluation has been largely restricted to text-only interactions, failing to capture the complexity of modern remote care delivery. Here we introduce a multimodal extension of the Articulate Medical Intelligence Explorer (multimodal AMIE), capable of gathering, interpreting and reasoning about multimodal data within a diagnostic conversation. To achieve this, we developed a state-aware dialogue framework that dynamically guides history-taking based on diagnostic uncertainty and evolving patient states, emulating the structured reasoning of experienced clinicians. We evaluated this updated, state-aware version of multimodal AMIE against primary care physicians (PCPs) in a randomized, blinded exploratory study comprising 105 simulated telehealth consultations, which included dermatology photographs, electrocardiograms and clinical documents. As assessed by 18 specialist physicians, multimodal AMIE outperformed PCPs not only in diagnostic accuracy but also in conversation quality, including history-taking and empathy. Specifically, multimodal AMIE demonstrated superior performance on 29 of 32 evaluation axes, including seven of nine metrics that assess multimodal reasoning. These results validate the efficacy of state-aware reasoning in bridging the gap between text and visual information and demonstrate the potential for artificial intelligence (AI) systems to augment clinicians in complex, multimodal diagnostic settings.

实验方法

状态感知对话阶段转换框架 OSCE风格方法混合效应方法累积序数模型伯努利模型单调回归曼-惠特尼U检验错误发现率 Benjamini-Hochberg方法单侧卡方检验

大家都在搜

大家都在搜

Advancing conversational diagnostic AI with multimodal reasoning

摘要

实验方法

关于丁香通

公司信息

个人用户

企业机构