BackgroundHigh-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible.ResultsWe present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1565-4) contains supplementary material, which is available to authorized users.
We present a numerical method to model dynamical systems from data. We use the recently introduced method Scalable Probabilistic Approximation (SPA) to project points from a Euclidean space to convex polytopes and represent these projected states of a system in new, lower-dimensional coordinates denoting their position in the polytope. We then introduce a specific nonlinear transformation to construct a model of the dynamics in the polytope and to transform back into the original state space. To overcome the potential loss of information from the projection to a lower-dimensional polytope, we use memory in the sense of the delayembedding theorem of Takens. By construction, our method produces stable models. We illustrate the capacity of the method to reproduce even chaotic dynamics and attractors with multiple connected components on various examples.
The reactivity ratios of acrylic acid (AA, M1) and its dimer β‐acroyloxypropionic acid (diAA, M2) are determined from cumulative copolymerization data by two different methods: classical parameter estimation (PE) by minimizing the objective function and a Bayesian analysis. Classical PE gives r1=0.74 and r2=1.23 at the minimum of the residual. From the Bayesian analysis, the probability distribution of the parameter sets is obtained, revealing the existence of parameter sets with rather the same probability. The influence of the number of data and the size of the measurement error are discussed.
Two different approaches to parameter estimation (PE) in the context of polymerization are introduced, refined, combined, and applied. The first is classical PE where one is interested in finding parameters which minimize the distance between the output of a chemical model and experimental data. The second is Bayesian PE allowing for quantifying parameter uncertainty caused by experimental measurement error and model imperfection. Based on detailed descriptions of motivation, theoretical background, and methodological aspects for both approaches, their relation are outlined. The main aim of this article is to show how the two approaches complement each other and can be used together to generate strong information gain regarding the model and its parameters. Both approaches and their interplay in application to polymerization reaction systems are illustrated. This is the first part in a two‐article series on parameter estimation for polymer reaction kinetics with a focus on theory and methodology while in the second part a more complex example will be considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.