Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.
Abstract. Determining the three-dimensional (3D) structure of proteins and protein complexes at atomic resolution is a fundamental task in structural biology. Over the last decade, remarkable progress has been made using "single particle" cryo-electron microscopy (cryo-EM) for this purpose. In cryo-EM, hundreds of thousands of two-dimensional (2D) images are obtained of individual copies of the same particle, each held in a thin sheet of ice at some unknown orientation. Each image corresponds to the noisy projection of the particle's electron-scattering density. The reconstruction of a high-resolution image from this data is typically formulated as a nonlinear, nonconvex optimization problem for unknowns which encode the angular pose and lateral offset of each particle. Since there are hundreds of thousands of such parameters, this leads to a very CPU-intensive task-limiting both the number of particle images which can be processed and the number of independent reconstructions which can be carried out for the purpose of statistical validation. Moreover, existing reconstruction methods typically require a good initial guess to converge. Here, we propose a deterministic method for high-resolution reconstruction that operates in an ab initio manner-that is, without the need for an initial guess. It requires a predictable and relatively modest amount of computational effort, by marching out radially in the Fourier domain from low to high frequency, increasing the resolution by a fixed increment at each step.
The fast Gauss transform allows for the calculation of the sum of N Gaussians at M points in O(N + M) time. Here, we extend the algorithm to a wider class of kernels, motivated by quadrature issues that arise in using integral equation methods for solving the heat equation on moving domains. In particular, robust high-order product integration methods require convolution with O(q) distinct Gaussian-type kernels in order to obtain q th order accuracy in time. The generalized Gauss transform permits the computation of each of these kernels and, thus, the construction of fast solvers with optimal computational complexity. We also develop plane-wave representations of these Gaussian-type fields, permitting the "diagonal translation" version of the Gauss transform to be applied. When the sources and targets lie on a uniform grid, or a hierarchy of uniform grids, we show that the curse of dimensionality (the exponential growth of complexity constants with dimension) can be avoided. Under these conditions, our implementation has a lower operation count than the fast Fourier transform (FFT).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.