The sheer volume of data anticipated to be captured by future radio telescopes, such as, The Square Kilometer Array (SKA) and its precursors present new data challenges, including the cost and technical feasibility of data transport and storage. Image and data compression are going to be important techniques to reduce the data size. We provide a quantitative analysis of the effects of JPEG2000's lossy wavelet image compression algorithm on the quality of the radio astronomy imagery data. This analysis is completed by evaluating the completeness, soundness and source parameterisation of the Duchamp source finder using compressed data. Here we found the JPEG2000 image compression has the potential to denoise image cubes, however this effect is only significant at high compression rates where the accuracy of source parameterisation is decreased.
In the current study, we show how ProCan90, a curated data set of HEK293 technical replicates, can be used to optimize the configuration options for algorithms in the OpenSWATH pipeline. Furthermore, we use this case study as a proof of concept for horizontal scaling of such a pipeline to allow 45 810 computational analysis runs of OpenSWATH to be completed within four and a half days on a budget of US $10 000. Through the use of Amazon Web Services (AWS), we have successfully processed each of the ProCan 90 files with 506 combinations of input parameters. In total, the project consumed more than 340 000 core hours of compute and generated in excess of 26 TB of data. Using the resulting data and a set of quantitative metrics, we show an analysis pathway that allows the calculation of two optimal parameter sets, one for a compute rich environment (where run time is not a constraint), and another for a compute poor environment (where run time is optimized). For the same input files and the compute rich parameter set, we show a 29.8% improvement in the number of quality protein (>2 peptide) identifications found compared to the current OpenSWATH defaults, with negligible adverse effects on quantification reproducibility or drop in identification confidence, and a median run time of 75 min (103% increase). For the compute poor parameter set, we find a 55% improvement in the run time from the default parameter set, at the expense of a 3.4% decrease in the number of quality protein identifications, and an intensity CV decrease from 14.0% to 13.7%.
Motivation The output of electrospray ionisation - liquid chromatography mass spectrometry (ESI-LC-MS) is influenced by multiple sources of noise and major contributors can be broadly categorised as baseline, random and chemical noise. Noise has a negative impact on the identification and quantification of peptides, which influences the reliability and reproducibility of MS-based proteomics data. Most attempts at denoising have been made on either spectra or chromatograms independently, thus important two-dimensional information is lost because the mass-to-charge ratio and retention time dimensions are not considered jointly. Results This paper presents a novel technique for denoising raw ESI-LC-MS data via two-dimensional undecimated wavelet transform, which is applied to proteomics data acquired by data-independent acquisition MS (DIA-MS). We demonstrate that denoising DIA-MS data results in the improvement of peptide identification and quantification in complex biological samples. Availability The software is available on Github (https://github.com/CMRI-ProCan/CRANE). The datasets were obtained from ProteomeXchange (Identifiers—PXD002952 and PXD008651). Preliminary data and intermediate files are available via ProteomeXchange (Identifiers—PXD020529 and PXD025103). Supplementary information Supplementary information is available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.