Jan 14, 2024

Towards Conversational Diagnostic AI

A short explainer on AMIE, Google's diagnostic dialogue system for medical interviews and clinical reasoning.

AI doctors are getting closer and more real than we think. A group of scientists and engineers from Google Research and Google DeepMind introduced AMIE, an Articulate Medical Intelligence Explorer, which is an LLM-based AI system optimized for diagnostic dialogue. Its key goal is to be capable of naturalistic medical conversations to take patient history, reason about diagnoses, and communicate with empathy like a clinician.

The medical interview

The purpose of the model is to gather information from the patient and deduct what was the cause. It is believed that 60-80% of diagnoses are made through clinical history-taking alone. It makes the medical interview often called "the most powerful, sensitive, and most versatile instrument available to the physician". Creating such AI systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of healthcare. Hence, Amie is an experiment towards a conversational medical AI system for clinical history-taking and diagnostic reasoning.

Methodology

The researchers created a simulated self-play environment for AMIE to practice diagnostic conversations across thousands of medical conditions. This allows AMIE's skills to improve through automated feedback. AMIE also is finetuned on real-world medical dialogues and QA data.

During the inference, AMIE adopts a three-step chain of reasoning, summarizes the patient's history, formulating differential diagnoses, and refining responses for clarity, empathy, and accuracy. This iterative strategy ensures nuanced and precise medical consultations.

AMIE is the instruction-tuned model based on the PaLM-2 LLM using a mixture of tasks and data:

Medical QA, reasoning, summarization datasets - Long-form responses to medical questions, 65 summaries of EHR notes, USMLE QA, expert reasoning chains.
Real-world medical dialogues - 98,919 transcripts of doctor-patient dialogues over 10 years.
and simulated dialogues from a self-play environment.

The environment comprises three elements: a Vignette Generator that creates patient scenarios, a Dialogue Generator for simulating patient-doctor conversations (both roles played by AMIE), and a Self-play Critic providing feedback for improvement. AMIE's learning process includes an "inner loop" for refining behavior based on a critic feedback and an "outer loop" for incorporating these refined dialogues into further training. When in use, AMIE follows a three-step "chain of reasoning" strategy - summarizing medical history, formulating responses and recommendations, and revising these for accuracy, clarity, and empathy.

Results

The researchers evaluated AMIE in a randomized study against real primary care doctors conducting text-based interviews with simulated patients. The study design mimicked an objective structured clinical exam used to evaluate medical students using 149 patient case scenarios. AMIE and the doctors were scored by specialist physicians and patient actors on history taking, diagnostic accuracy, treatment recommendations, and communication criteria.

AMIE surpassed the doctors on 28 out of 32 criteria as rated by specialists, and 24 out of 26 criteria as rated by patient actors. AMIE's differential diagnoses were rated as more accurate, complete, and appropriate compared to the doctors' by specialist evaluators. Interestingly, LLM also received higher scores for empathy, shared decision-making, and maintaining patient welfare. However, the text interface was unfamiliar to doctors, scenarios covered limited conditions, and further real-world evaluation is needed before deployment. But results represent promising progress for conversational diagnostic AI.

← AI explained