FiD: a software for <i>ab initio</i> structural identification of product ions from tandem mass spectrometric data

Heinonen, Markus; Rantanen, Ari; Mielikäinen, Taneli; Kokkonen, Juha; Kiuru, Jari; Ketola, Raimo A.; Rousu, Juho

doi:10.1002/rcm.3701

Cited by 124 publications

(113 citation statements)

References 41 publications

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…Particular progress has been made for restricted metabolite classes such as lipids (5), but as with peptides, results cannot be generalized to other metabolite classes. For the general case, several strategies have been proposed during recent years, including simulation of mass spectra from molecular structure (10,11), combinatorial fragmentation (12)(13)(14)(15)(16)(17), and prediction of molecular fingerprints (18,19).…”

mentioning

confidence: 99%

Searching molecular structure databases with tandem mass spectra using CSI:FingerID

Dührkop

Shen

Meusel

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

834

829

View full text Add to dashboard Cite

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.mass spectrometry | small compound identification | metabolomics | bioinformatics | machine learning M etabolites, small molecules that are involved in cellular reactions, can provide detailed information about cellular state. Untargeted metabolomic studies may use NMR or MS technologies, but liquid chromatography followed by MS (LC/MS) can detect the highest number of metabolites from minimal amounts of sample (1, 2). Untargeted metabolomics comprehensively compares the mass spectral intensities of metabolite signals (peaks) between two or more samples (3, 4). Advances in MS instrumentation allow us to simultaneously detect thousands of metabolites in a biological sample. Identification of these compounds relies on tandem MS (MS/MS) data, produced by fragmenting the compound and recording the masses of the fragments. Structural elucidation remains a challenging problem, in particular for compounds that cannot be found in any spectral library (1): In total, all available spectral MS/MS libraries of pure chemical standards cover fewer than 20,000 compounds (5). Growth of spectral libraries is limited by the unavailability of pure reference standards for many compounds.In contrast, molecular structure databases such as PubChem (6) and ChemSpider (7) contain millions of compounds, with PubChem alone having surpassed 50 million entries. Searching in molecular structure databases using MS/MS data is therefore considered a powerful tool for assisting an expert in the elucidation of a compound. This problem is considerably harder than the fundamental analysis step in the shotgun proteomics workflow, namely, searching peptide MS/MS data in a peptide sequence database (8): Unlike proteins and peptides, metabolites show a large structural variability and, consequently, also large variations in MS/MS fragmentation. Computational approaches for interpreting and predicting MS/MS data of small molecules date back to the 1960s (9): Due to the unavailability of molecular structure databases at that time, structure libraries were combinatorially generated and then "searched" using the experimental MS/MS data. "Modern" methods for this question have been developed since mid-2000. Particular progress has been made for restricted metabolite cl...

show abstract

mentioning

confidence: 99%

Searching molecular structure databases with tandem mass spectra using CSI:FingerID

Dührkop

Shen

Meusel

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

834

829

View full text Add to dashboard Cite

show abstract

“…This algorithm can be coupled to quantum chemistry codes to obtain the optimized geometries and energies of the fragments but this aspect is not developed here. Analog algorithms, using more accurate bond energy estimates, have been devoted to mass spectrometry analysis like the FiD code [17] .…”

Section: Structural Extension Of the Statistical Modelmentioning

confidence: 99%

“…In the case of the dissociation of biomolecules studied by mass spectrometric methods, rather complex mass spectra are observed and their interpretation can be challenging. Heinonen et al [17] proposed a method based on a combinatorial approach in order to help the interpretation of molecular fragmentation.…”

Section: Introductionmentioning

confidence: 99%

A simple ‘statistical’ approach for fragmentation studies of doubly ionized cytosine, thymine and uracil bases

Champeaux¹,

Çarçabal²,

Sence³

et al. 2011

J. Phys. B: At. Mol. Opt. Phys.

View full text Add to dashboard Cite

A simple statistical model describing the dissociation of molecular dications into correlated fragment pairs has been developed. This model is based on a combinatory approach in which all possible fragments are enumerated and is refined by taking into account the initial structure of the parent molecule, considering the number of chemical bonds to be broken to give rise to the fragments. We show how this model can be used as a tool to help interpreting experimental results of coincidence experiments. It shows that dissociation of doubly ionized molecules upon 100 keV proton irradiation is dominated by statistical processes but it also enables an easy identification of the dissociation products originating from non-statistical processes requiring further investigation, possibly conveying information on the radiation-molecule interaction itself.

show abstract

“…A computational approach can overcome these problems by virtue of its ability to provide an automatic interpretation. [6][7][8][9] We previously reported on an automated annotation system for interpreting the nontargeted analysis of a multistage CID (MS n ) spectrum, followed by the retrieval of candidates from a compound database. 10) We were able to successfully annotate 20 components that showed contribution to tea quality, such as ca eine, catechins, and a series of organic acid esters.…”

Section: Introductionmentioning

confidence: 99%

Method for the Compound Annotation of Conjugates in Nontargeted Metabolomics Using Accurate Mass Spectrometry, Multistage Product Ion Spectra and Compound Database Searching

et al. 2015

View full text Add to dashboard Cite

Owing to biotransformation, xenobiotics are o en found in conjugated form in biological samples such as urine and plasma. Liquid chromatography coupled with accurate mass spectrometry with multistage collision-induced dissociation provides spectral information concerning these metabolites in complex materials. Unfortunately, compound databases typically do not contain a su cient number of records for such conjugates. We report here on the development of a novel protocol, referred to as ChemProphet, to annotate compounds, including conjugates, using compound databases such as PubChem and ChemSpider.e annotation of conjugates involves three steps: 1. Recognition of the type and number of conjugates in the sample; 2. Compound search and annotation of the deconjugated form; and 3. In silico evaluation of the candidate conjugate. ChemProphet assigns a spectrum to each candidate by automatically exploring the substructures corresponding to the observed product ion spectrum. When nished, it annotates the candidates assigning a rank for each candidate based on the calculated score that ranks its relative likelihood. We assessed our protocol by annotating a benchmark dataset by including the product ion spectra for 102 compounds, annotating the commercially available standard for quercetin 3-glucuronide, and by conducting a model experiment using urine from mice that had been administered a green tea extract. e results show that by using the ChemProphet approach, it is possible to annotate not only the deconjugated molecules but also the conjugated molecules using an automatic interpretation method based on deconjugation that involves multistage collision-induced dissociation and in silico calculated conjugation.Please cite this article as: Mass Spectrom (Tokyo) 2015; 4(1): A0036

show abstract

FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data

Cited by 124 publications

References 41 publications

Searching molecular structure databases with tandem mass spectra using CSI:FingerID

Searching molecular structure databases with tandem mass spectra using CSI:FingerID

A simple ‘statistical’ approach for fragmentation studies of doubly ionized cytosine, thymine and uracil bases

Method for the Compound Annotation of Conjugates in Nontargeted Metabolomics Using Accurate Mass Spectrometry, Multistage Product Ion Spectra and Compound Database Searching

Contact Info

Product

Resources

About