Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression–based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMS) with distinguishing features: CMS1 (MSI Immune, 14%), hypermutated, microsatellite unstable, strong immune activation; CMS2 (Canonical, 37%), epithelial, chromosomally unstable, marked WNT and MYC signaling activation; CMS3 (Metabolic, 13%), epithelial, evident metabolic dysregulation; and CMS4 (Mesenchymal, 23%), prominent transforming growth factor β activation, stromal invasion, and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intra-tumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC – with clear biological interpretability – and the basis for future clinical stratification and subtype–based targeted interventions.
Colorectal carcinoma side is associated with differences in key molecular features, some immediately druggable, with important prognostic effects which are maintained in metastatic lesions. Although within side significant molecular heterogeneity remains, our findings justify stratification of patients by side for retrospective and prospective analyses of drug efficacy and prognosis.
Successful honey bee breeding programmes require traits that can be genetically improved by selection. Heritabilities for production, behaviour, and health traits, as well as their phenotypic correlations, were estimated in two distinct Swiss Apis mellifera mellifera and Apis mellifera carnica populations based on 9 years of performance records and more than two decades of pedigree information. Breeding values were estimated by a best linear unbiased prediction (BLUP) approach, taking either queen or worker effects into account. In A. m. mellifera, the highest heritabilities were obtained for defensive behaviour, calmness during inspection, and hygienic behaviour, while in A. m. carnica, honey yield and hygienic behaviour were the most heritable traits. In contrast, estimates for infestation rates by Varroa destructor suggest that the phenotypic variation cannot be attributed to an additive genetic origin in either population. The highest phenotypic correlations were determined between defensive behaviour and calmness during inspection. The implications of these findings for testing methods and the management of the breeding programme are discussed.
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference. P roteomics, the comprehensive and quantitative analysis of proteins that are expressed in a given organ, tissue, or cell line, provides unique insights into biological systems that cannot be provided by genomics or transcriptomics approaches (1).With the advent of shotgun proteomics [gel-free liquid chromatography tandem mass spectrometry (LC-MS/MS)] (2), the number of distinct proteins that could be identified from complex samples has significantly increased compared to more traditional gel-based approaches. Shotgun proteomics has become the method of choice for the analysis of complex protein mixtures (1). Briefly, proteins are extracted from their biological source and enzymatically digested into peptides (usually using trypsin). The peptides are then separated by liquid chromatography and analyzed by tandem mass spectrometry. Peptides are thus the elementary unit of measure in LC-MS/MS (from now on, we assume that protein implies protein sequence and peptide implies peptide sequence).In this paper, we focus on a probabilistic model to address the problem of protein inference. The peptide identifications, i.e., the (posterior) probabilities that a given peptide is present in a sample of interest (or a corresponding discriminant score) are the input for our statistical model and algorithm for inferring posterior probabilities that individual proteins are present in the sample. As one important difference to previous solutions, the Markovian Inference of Proteins and Gene Models (MIPGEM) also allows to infer the presence or absence of gene models instead of being restricted to proteins. This is a useful extension for the integration of proteomics and transcriptomics data.Earlier proposals for protein inference models include refs. 3-14. A brief description of some of these methods can be found in ref. 11.The main elements characterizing our approach include the following: (i) We take uncertainties related to the peptide-spectrum matching process into accou...
BackgroundWith the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.FocusThe current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.DataWe work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.MethodsWe focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.