Stanford's Merlin puts vision-language AI on full 3D CT scans

The Nature study tested Merlin across 752 tasks and released code, weights and a de-identified abdominal CT dataset.

By · Published

Why it matters

Merlin shows how radiology AI is moving from single-purpose classifiers toward reusable 3D foundation models trained on routine hospital data, with open code and a gated dataset that make the work easier to test than most clinical AI papers.

Stanford's Merlin puts vision-language AI on full 3D CT scans — The Nature study tested Merlin across 752 tasks and released code, weights and a de-identified abdominal CT dataset.

Louis Blankemeier, Ashwin Kumar and Akshay S. Chaudhari's Stanford-led team published Merlin, a 3D vision-language foundation model for computed tomography, in a March 4 Nature paper that takes aim at one of radiology AI's practical gaps: most medical vision-language models have been built around 2D images and shorter text, while CT interpretation is volumetric, text-heavy and tied to patient history.

Merlin was trained on paired abdominal CT scans, diagnosis codes and radiology reports, using more than 6 million images from 15,331 CT scans, more than 1.8 million diagnosis codes and more than 6 million report tokens in the training set. The researchers evaluated the model on 6 task types and 752 individual tasks, including zero-shot findings classification, phenotype classification, image-report retrieval, 5-year chronic disease prediction, radiology report generation and 3D organ segmentation.

Reader comments

Conversation for this story loads after sign-in.