A generalizable machine learning framework for classifying DNA repair defects using ctDNA exomes
Abstract
Specific classes of DNA damage repair (DDR) defect can drive sensitivity to emerging therapies for metastatic prostate cancer. However, biomarker approaches based on DDR gene sequencing do not accurately predict DDR deficiency or treatment benefit. Somatic alteration signatures may identify DDR deficiency but historically require whole-genome sequencing of tumour tissue. We assembled whole-exome sequencing data for 155 high ctDNA fraction plasma cell-free DNA and matched leukocyte DNA samples from patients with metastatic prostate or bladder cancer. Labels for DDR gene alterations were established using deep targeted sequencing. Per sample mutation and copy number features were used to train XGBoost ensemble models. Naive somatic features and trinucleotide signatures were associated with specific DDR gene alterations but insufficient to resolve each class. Conversely, XGBoost-derived models showed strong performance including an area under the curve of 0.99, 0.99 and 1.00 for identifying BRCA2, CDK12, and mismatch repair deficiency in metastatic prostate cancer. Our machine learning approach re-classified several samples exhibiting genomic features inconsistent with original labels, identified a metastatic bladder cancer sample with a homozygous BRCA2 copy loss, and outperformed an existing exome-based classifier for BRCA2 deficiency. We present DARC Sign (DnA Repair Classification SIGNatures); a public machine learning tool leveraging clinically-practical liquid biopsy specimens for simultaneously identifying multiple types of metastatic prostate cancer DDR deficiencies. We posit that it will be useful for understanding differential responses to DDR-directed therapies in ongoing clinical trials and may ultimately enable prospective identification of prostate cancers with phenotypic evidence of DDR deficiency.
Introduction
Alterations in DNA damage repair (DDR) genes are common in metastatic castration-resistant prostate cancer (mCRPC)1,2. Deleterious germline and/or somatic mutations in homologous recombination repair (HRR) related genes including BRCA2, ATM, and CDK12 are present in 15–20% of patients1,3,4,5. A further 3–5% exhibit alterations in mismatch repair (MMR) genes MSH2, MSH6 or MLH13,6,7. Collectively, these gene alterations play a critical role in patient management, directly influencing systemic therapy selection. Poly (ADP-ribose) polymerase (PARP) inhibitors are approved for HRR gene-mutated mCRPC2. Platinum chemotherapy also has activity in mCRPC with HRR gene defects8,9,10,11. MMR deficient (MMRd) mCRPC responds to immune checkpoint inhibition, and CDK12 alterations have been linked to sensitivity to immunotherapy5,6. Unfortunately, even among biomarker-selected patients, clinical response rates to each class of treatment are sub-optimal12,13,14.
Since DDR is proficient in most mCRPC4, the utility of PARP inhibitors and other emerging therapies depends on accurate identification of vulnerable tumours2. The preferred clinical approach is to perform targeted sequencing across the exons of DDR genes in archival prostate biopsy tissue. However, gene alteration status from targeted sequencing is an incomplete predictor of DDR proficiency15. Firstly, targeted approaches may miss complex structural rearrangements, resulting in false negatives7. Secondly, evaluation of pathogenicity is imperfect, especially for missense mutations and non-BRCA genes16. Thirdly, biallelic loss can be difficult to discriminate from monoallelic loss17. Because most DDR genes are presumed haplosufficient, durability of response to targeted therapies is most strongly correlated with biallelic gene inactivation5,18. Finally, the clinical relevance of mutations in rarer DDR genes are unclear due to the anecdotal nature of any observed therapy response.
The most commonly altered DDR genes are associated with distinct patterns of genomic alterations. Defective MSH2 drives microsatellite instability and high tumour mutational burden7,19. CDK12-altered mCRPC exhibits genome-wide focal tandem duplications17,20. BRCA2 (though not ATM) defects are associated with mutational signatures of defective HRR, as in breast, ovarian, and pancreatic cancer21,22. In other cancers, innovative models have been developed to accurately identify defective HRR using mutational features from whole-genome sequencing15,23,24. However, different cancer types exhibit distinct mutational rates and processes, which influence model attributes and overall performance, especially in different clinical contexts and/or cancers not considered during model development15,23. Few models have been specifically developed for prostate cancer, which is characterised by widespread copy number alterations, complex structural rearrangements and comparatively low mutational burden, independent of DDR status22,25. No tools account for prostate cancer-specific features or can simultaneously identify BRCA2 deficient (BRCA2d), CDK12 deficient (CDK12d), and MMRd mCRPC from individual patient samples.
Routine whole-genome sequencing of tumour tissue biopsy is clinically unfeasible in mCRPC26,27. However, plasma circulating tumour DNA (ctDNA) is abundant in a large proportion of clinically-progressing mCRPC28, enabling identification of genomic features including copy number changes and mutations29. We recently demonstrated that published trinucleotide signatures of defective MMR can be inferred from whole-exome sequencing (WES) of ctDNA19. Here, we exploit algorithmic advances in boosted ensemble models30,31 to develop DARC Sign (DnA Repair Classification Signature) (Fig. 1a): a set of models and accompanying software for classifying clinically-actionable DDR deficiencies in prostate cancer using ctDNA WES.