Nasopharyngeal cancer (NPC), a cancer derived from epithelial cells in the nasopharynx, is a cancer common in China, Southeast Asia, and Africa. The three-dimensional (3D) genome organization of nasopharyngeal cancer is poorly understood. A major challenge in understanding the 3D genome organization of cancer samples is the lack of a method for the characterization of chromatin interactions in solid cancer needle biopsy samples. Here, we developed Biop-C, a modified in situ Hi-C method using solid cancer needle biopsy samples. We applied Biop-C to characterize three nasopharyngeal cancer solid cancer needle biopsy patient samples. We identified topologically associated domains (TADs), chromatin interaction loops, and frequently interacting regions (FIREs) at key oncogenes in nasopharyngeal cancer from the Biop-C heatmaps. We observed that the genomic features are shared at some important oncogenes, but the patients also display extensive heterogeneity at certain genomic loci. On analyzing the super enhancer landscape in nasopharyngeal cancer cell lines, we found that the super enhancers are associated with FIREs and can be linked to distal genes via chromatin loops in NPC. Taken together, our results demonstrate the utility of our Biop-C method in investigating 3D genome organization in solid cancers.
Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect’s key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.
Despite technological advances in proteomics, incomplete coverage and inconsistency issues persist, resulting in “data holes”. These data holes cause the missing protein problem (MPP), where relevant proteins are persistently unobserved, or sporadically observed across samples, hindering biomarker discovery and proper functional characterization. Network-based approaches can provide powerful solutions for resolving these issues. Functional Class Scoring (FCS) is one such method that uses protein complex information to recover missing proteins with weak support. However, FCS has not been evaluated on more recent proteomic technologies with higher coverage, and there is no clear way to evaluate its performance. To address these issues, we devised a more rigorous evaluation schema based on cross-verification between technical replicates and evaluated its performance on data acquired under recent Data-Independent Acquisition (DIA) technologies (viz. SWATH). Although cross-replicate examination reveals some inconsistencies amongst same-class samples, tissue-differentiating signal is nonetheless strongly conserved, confirming that FCS selects for biologically meaningful networks. We also report that predicted missing proteins are statistically significant based on FCS p values. Despite limited cross-replicate verification rates, the predicted missing proteins as a whole have higher peptide support than non-predicted proteins. FCS also predicts missing proteins that are often lost due to weak specific peptide support.
Proteomic studies characterize the protein composition of complex biological samples. Despite recent advancements in mass spectrometry instrumentation and computational tools, low proteome coverage and interpretability remains a challenge. To address this, we developed Proteome Support Vector Enrichment (PROSE), a fast, scalable and lightweight pipeline for scoring proteins based on orthogonal gene co-expression network matrices. PROSE utilizes simple protein lists as input, generating a standard enrichment score for all proteins, including undetected ones. In our benchmark with 7 other candidate prioritization techniques, PROSE shows high accuracy in missing protein prediction, with scores correlating strongly to corresponding gene expression data. As a further proof-of-concept, we applied PROSE to a reanalysis of the Cancer Cell Line Encyclopedia proteomics dataset, where it captures key phenotypic features, including gene dependency. We lastly demonstrated its applicability on a breast cancer clinical dataset, showing clustering by annotated molecular subtype and identification of putative drivers of triple-negative breast cancer. PROSE is available as a user-friendly Python module from https://github.com/bwbio/PROSE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.