DeepBD: an evidence-first workflow that ranks genetic variants for fetal and infant birth defects
This paper describes DeepBD, a new computer system to help narrow down which genetic changes might explain birth defects seen before or soon after birth. The system is not a final diagnosis tool. It is a workflow that combines a trained scoring engine, specialist analysis tools, and a language model to organize evidence and produce a ranked list of candidate variants for clinician review.
DeepBD has four main parts. First, a large language model (LLM) helps turn clinical notes and imaging findings into structured phenotype terms, for example using the Human Phenotype Ontology (HPO). Second, a pretrained evidence engine scores each candidate variant by combining explicit clinical rules, sequence-based features and measures of predicted molecular effect, and phenotype-conditioned biological context (cell types and pathways). Third, specialist modules add targeted external evidence and refinements, such as protein-structure checks and database lookups. Finally, a grounded diagnostic agent organizes all provenance-preserving evidence, lets users refine the top candidates, and generates a human-readable synthesis.
The team developed DeepBD using a large in-house fetal and infant cohort of 18,622 cases with sequencing and phenotype data. On an internal held-out set of solved cases, DeepBD ranked the correct causal variant first in 65.8% of cases and within the top 10 in 92.9% of cases (Recall@1/3/5/10 = 0.658 / 0.882 / 0.912 / 0.929). The authors report that DeepBD outperformed other tools they tested, including Exomiser, DeepRare and a setup using prompted LLM reranking on Exomiser-derived top-20 candidate lists. Tests that removed parts of the system (ablation analyses) suggest that rule-based evidence, mechanistic biological context, and specialist refinement each add complementary value.
This work matters because modern exome and genome tests often leave clinicians with many candidate variants to judge. Prenatal and early-infant presentations are especially hard to interpret because imaging gives an incomplete view of development. By building a trainable evidence engine and separating automated evidence computation from LLM-supported review and tool-based refinement, DeepBD aims to make the post-sequencing decision process more systematic, auditable, and tailored to the patient.