Cell-free DNA TAPS provides multimodal information for early cancer detection

Description

(H) Methylation variance in 1 Mb genomic windows in non-cancer controls, HCC and PDAC. (I) PCA plot of cfDNA methylation in 1 kb genomic windows in non-cancer controls and HCC, noncancer controls and PDAC (Crohn's disease and colitis are coloured in green and yellow respectively).

Fig. S3. HCC and PDAC prediction based on cfDNA DMRs. (A)
Overview of the LOO model training and validation approach. Total number of samples is labelled as n. At each iteration, the model training set consists of n -1 samples. Differentially methylated enhancers (for HCC) or promoters (for PDAC) were selected for model building. The predictive model was evaluated on the held-out test sample in each fold. Cirrhosis and pancreatitis samples were not included in DMR identification and model building. (B) HCC cancer prediction scores for cirrhosis samples. Each blue dot represents the predicted score for an individual LOO model. The Black dot shows average probability score for a particular sample. The dashed line represents probability score threshold. Samples with average probability score above this threshold were predicted as HCC. (C) Gene Ontology analysis of genes related to differentially methylated enhancers based in HCC cfDNA (P value < 0.002) using Enrichr (53) against NCI-Nature Pathway Interaction. Top 10 categories selected based on P value are shown in the graph. Gene-enhancer interactions were assigned using GeneHancer reference database (52). (D) Methylation of representative differentially methylated enhancer in HCC cfDNA for DLC1 gene (two-tailed t-test P value = 8.765e-06). (E) PDAC cancer prediction scores for pancreatitis samples. Each yellow dot represents the predicted score for an individual LOO model. The black dot shows the average probability score for a particular sample. The dashed line represents probability score threshold. Samples with average probability score above this threshold were predicted as PDAC. (F) Gene Ontology analysis of the genes nearest to the differentially methylated promoters in PDAC cfDNA (P value < 0.002) using Enrichr (53) against NCI-Nature Pathway Interaction. Top 10 categories selected based on P value are shown on the graph. (G) Methylation of representative differentially methylated promoter in PDAC cfDNA for RB1 gene (two-tailed t-test P value = 0.0017). (H) HCC cancer prediction scores for the independent cfDNA WGBS dataset (EGAD00001004317) (24). Each dot represents the predicted score for an individual LOO model. Grey dot belongs to noncancer controls and the red dot belongs to HCC. The Black dot shows average probability score for a particular sample. The dashed line represents probability score threshold. Samples with average probability score above this threshold were predicted as HCC.  and PDAC cfDNA samples. (D) ROC curve of model performance using tissue contribution to classify HCC vs. non-cancer. (E) LOO cancer prediction scores for HCC and non-cancer controls using classifiers trained on tissue contribution. The dashed line represents the probability score threshold. Samples with probability score above this threshold were predicted as HCC. (F) Cancer scores for cirrhosis samples using HCC vs. non-cancer classifiers. Each blue dot represents the predicted scores for an individual model. Black dot shows the average probability score for a particular sample. Dashed line represents probability score threshold. Samples with average probability score above this threshold were predicted as HCC. (G) ROC curve of model performance using tissue contribution to classify PDAC vs control. (H) LOO cancer prediction scores for PDAC and non-cancer controls using classifiers built based on tissue contribution. Dashed line represents probability score threshold. Samples with probability score above this threshold were predicted as PDAC. (I) PDAC Cancer scores for pancreatitis samples using PDAC vs. non-cancer classifiers. Each yellow dot represents the predicted scores for an individual model. Black dot shows the average probability score for a particular sample. Dashed line represents probability score threshold. Samples with average probability score above this threshold were predicted as PDAC.  Frequency was calculated as number of fragments of particular length divided by total number of fragments. (B) ROC curve of HCC and non-cancer control prediction scores from a generalized linear model using proportion of long cfDNA fragments (300-500 bp) in 10 bp bins as features. (C) Cancer prediction scores for HCC and non-cancer controls in classifiers trained using LOO cross-validation. The dashed line represents the probability score threshold. Samples with a probability score above this threshold were predicted as HCC. (D) HCC cancer prediction scores for cirrhosis samples in these classifiers. Each blue dot represents the predicted score for an individual model. Black dots show average prediction score. The dashed line represents probability score threshold: samples with average probability score above this threshold were predicted as HCC. (E) ROC curve of PDAC and non-cancer control prediction scores from a generalized linear model using proportion of long cfDNA fragments (300-500 bp) in 10 bp bins as features. (F) LOO cancer prediction scores for PDAC and non-cancer controls in classifiers built based on cfDNA fragments frequency in 10 bp length range. The dashed line represents the probability score threshold. Samples with probability score above this threshold were predicted as PDAC. (G) PDAC cancer prediction scores for pancreatitis samples in classifiers built based on cfDNA fragments frequency in 10 bp length range. Each yellow dot represents the predicted score for an individual model. Black dots show average prediction score. The dashed line represents probability score threshold: samples with average probability score above this threshold were predicted as PDAC.  Tables   Table S1. (Microsoft Excel spreadsheet) TAPS Sequencing statistics.

Table S2. (Microsoft Excel spreadsheet)
Clinical details of cfTAPS study cohort.

Table S3. (Microsoft Excel spreadsheet)
Differentially methylated enhancers used for HCC vs. Control Prediction.

Table S5. (Microsoft Excel spreadsheet)
Source of public methylation WGBS data used for generation of tissue map.

Table S6. (Microsoft Excel spreadsheet)
cfDNA tissue contribution for each patient in cfTAPS cohort.

Table S7. (Microsoft Excel spreadsheet)
Fragments length distribution in each individual.