Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.
BackgroundHigh-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms.ResultsIn this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy.ConclusionsHere, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.
The issue of exactly what is measured by different types of reading items has been a matter of interest in the field of reading research for many years. Language teaching and testing specialists have raised the question of whether a reading test for foreign students wishing to enter university in the United States should include questions testing abilities beyond linguistic and very general discourse competencies, or indeed whether it is possible to separate these language competencies from other competencies. The purpose of this study was to investigate the dimensionality of the TOEFL® reading test, based on the specifications in use as of April 1991. Of particular interest was whether four item types identified in the test specifications as “reasoning items” could be shown to measure, in addition to general reading ability, any abilities not measured by the other item types in the TOEFL reading test. Two techniques, Stout's procedure and NOHARM analyses, were employed to investigate the hypothesized two‐factor model. In both cases the data failed to fit the model, indicating that TOEFL “reasoning items” cannot be shown to measure a unique construct. However, the follow‐up exploratory analyses indicated that all 10 test forms used in the study violated the assumption of essential unidimensionality, and all of the forms appeared to fit a two‐factor model where the second factor may be related to passage content or position.
The present study compared the performance of LOGIST and BILOG on TOEFL IRT‐based scaling and equating using both real and simulated data and two calibration structures. Applications of IRT for the TOEFL program are based on the three‐parameter logistic (3PL) model. The results of the study show that item parameter estimates obtained from the smaller real data sample sizes were more consistent with the larger sample estimates when based on BILOG than when based on LOGIST. In addition, the root mean squared error statistics suggest that the BILOG estimates for the item parameters and item characteristic curves were closer in magnitude to the “true” parameter values than were the LOGIST estimates. The equating results based on the parameter estimates suggest that the rule of thumb recommendation that pretest sample sizes be at least 1000 for LOGIST should be retained if at all possible.
IRT equating methods have been used successfully with the TOEFL® test for many years, and for the most part the observed properties of items have been consistent with model predictions. However, items that do not appear to hold their IRT pretest estimates do exist. If relationships can be found between features of TOEFL items in pretest calibrations and subsequent lack of model‐data fit when these items are used in final forms, steps to eliminate the use of such items in TOEFL final forms can be taken. The purpose of this study was to provide an exploratory investigation of item features that may contribute to a lack of invariance of TOEFL item parameters. The results of the study indicated the following: (1) subjective and quantitative measures developed for the study provided consistent information related to the model‐data fit of TOEFL test items, (2) for Sections 1 and 2, items that were pretested before 1986 exhibited poorer model‐data fit than items that were pretested after 1986, and (3) for Section 3 reading comprehension, model‐data fit appeared to be related to changes in the relative position of items within the sections from the pretest to the final form administrations. Based on the results of the study, it was recommended that (1) the TOEFL program investigate the feasibility of not using pretest IRT statistics for items pretested before 1986 for Sections 1 and 2 and (2) that guidelines be developed for test developers to use with reading comprehension items to limit the change in relative positions of items in the test from pretest to final form administrations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.