Introduction
Earlier studies have shown that lymphomatous effusions in patients with diffuse large B-cell lymphoma (DLBCL) are associated with a very poor prognosis, even worse than for non-effusion-associated patients with stage IV disease. We hypothesized that certain genetic abnormalities were associated with lymphomatous effusions, which would help to identify related pathways, oncogenic mechanisms, and therapeutic targets.
Methods
We compared whole-exome sequencing on DLBCL samples involving solid organs (n = 22) and involving effusions (n = 9). We designed a mutational accumulation-based approach to score each gene and used mutation interpreters to identify candidate pathogenic genes associated with lymphomatous effusions. Moreover, we performed gene-set enrichment analysis from a microarray comparison of effusion-associated versus non-effusion-associated DLBCL cases to extract the related pathways.
Results
We found that genes involved in identified pathways or with high accumulation scores in the effusion-based DLBCL cases were associated with migration/invasion. We validated expression of 8 selected genes in DLBCL cell lines and clinical samples: MUC4, SLC35G6, TP53BP2, ARAP3, IL13RA1, PDIA4, HDAC1 and MDM2, and validated expression of 3 proteins (MUC4, HDAC1 and MDM2) in an independent cohort of DLBCL cases with (n = 31) and without (n = 20) lymphomatous effusions. We found that overexpression of HDAC1 and MDM2 correlated with the presence of lymphomatous effusions, and HDAC1 overexpression was associated with the poorest prognosis.
Conclusion
Our findings suggest that DLBCL associated with lymphomatous effusions may be associated mechanistically with TP53-MDM2 pathway and HDAC-related chromatin remodeling mechanisms.
Several studies to date have proposed different types of interpreters for measuring the degree of pathogenicity of variants. However, in predicting the disease type and disease–gene associations, scholars face two essential challenges, namely the vast number of existing variants and the existence of variants which are recognized as variant of uncertain significance (VUS). To tackle these challenges, we propose algorithms to assign a significance to each gene rather than each variant, describing its degree of pathogenicity. Since the interpreters identified most of the variants as VUS, most of the gene scores were identified as uncertain significance. To predict the uncertain significance scores, we design two matrix factorization-based models: the common latent space model uses genomics variant data as well as heterogeneous clinical data, while the single-matrix factorization model can be used when heterogeneous clinical data are unavailable. We have managed to show that the models successfully predict the uncertain significance scores with low error and high accuracy. Moreover, to evaluate the effectiveness of our novel input features, we train five different multi-label classifiers including a feedforward neural network with the same feature set and show they all achieve high accuracy as the main impact of our approach comes from the features. Availability: The source code is freely available at https://github.com/sabdollahi/CoLaSpSMFM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.