Exploration of multivariate analysis in microbial coding sequence modeling

Mehmood, Tahir; Bohlin, Jon; Kristoffersen, Anja Bråthen; Sæbø, Solve; Warringer, Jonas; Snipen, Lars

doi:10.1186/1471-2105-13-97

Cited by 8 publications

(7 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, in models with more than one component, VIP might not give the correct relationship between the pattern of variables and response ( Y ). In the case of OPLS-DA loading, weights of the predicative component will always give a correct relationship; however, there are difficulties in defining a threshold based on loading weights [for review Mehmood et al ( 2012 )]. Thus, in the present study, a combination of VIP and the weights of the predictive component were used.…”

Section: Discussionmentioning

confidence: 99%

Comparative metabolomics of muscle interstitium fluid in human trapezius myalgia: an in vivo microdialysis study

et al. 2013

View full text Add to dashboard Cite

PurposeThe mechanisms behind trapezius myalgia are unclear. Many hypotheses have been presented suggesting an altered metabolism in the muscle. Here, muscle microdialysate from healthy and myalgic muscle is analysed using metabolomics. Metabolomics analyse a vast number of metabolites, enabling a comprehensive explorative screening of the cellular processes in the muscle.MethodsMicrodialysate samples were obtained from the shoulder muscle of healthy and myalgic subjects that performed a work and stress test. Samples from the baseline period and from the recovery period were analysed using gas chromatography—mass spectrometry (GC–MS) together with multivariate analysis to detect differences in extracellular content of metabolites between groups. Systematic differences in metabolites between groups were identified using multivariate analysis and orthogonal partial least square discriminate analysis (OPLS-DA). A complementary Mann–Whitney U test of group difference in individual metabolites was also performed.ResultsA large number of metabolites were detected and identified in this screening study. At baseline, no systematic differences between groups were observed according to the OPLS-DA. However, two metabolites, l-leucine and pyroglutamic acid, were significantly more abundant in the myalgic muscle compared to the healthy muscle. In the recovery period, systematic difference in metabolites between the groups was observed according to the OPLS-DA. The groups differed in amino acids, fatty acids and carbohydrates. Myristic acid and putrescine were significantly more abundant and beta-d-glucopyranose was significantly less abundant in the myalgic muscle.ConclusionThis study provides important information regarding the metabolite content, thereby presenting new clues regarding the pathophysiology of the myalgic muscle.

show abstract

Section: Discussionmentioning

confidence: 99%

Comparative metabolomics of muscle interstitium fluid in human trapezius myalgia: an in vivo microdialysis study

et al. 2013

View full text Add to dashboard Cite

show abstract

“…We have used the Partial Least Squares (PLS) method [ 24 ], which is one in a long list of supervised learning methods. PLS is well established and has been used in many bioinformatics applications, also for the analysis of sequence data [ 25 , 26 ]. PLS is especially applicable when there are many correlated explanatory variables.…”

Section: Methodsmentioning

confidence: 99%

A systematic search for discriminating sites in the 16S ribosomal RNA gene

Vinje

Almøy

Liland

et al. 2014

Microb Informatics Exp

Self Cite

View full text Add to dashboard Cite

BackgroundThe 16S rRNA is by far the most common genomic marker used for prokaryotic classification, and has been used extensively in metagenomic studies over recent years. Along the 16S gene there are regions with more or less variation across the kingdom of bacteria. Nine variable regions have been identified, flanked by more conserved parts of the sequence. It has been stated that the discriminatory power of the 16S marker lies in these variable regions. In the present study we wanted to examine this more closely, and used a supervised learning method to search systematically for sites that contribute to correct classification at either the phylum or genus level.ResultsWhen classifying phyla the site selection algorithm located 50 discriminative sites. These were scattered over most of the alignments and only around half of them were located in the variable regions. The selected sites did, however, have an entropy significantly larger than expected, meaning they are sites of large variation. We found that the discriminative sites typically have a large entropy compared to their closest neighbours along the alignments. When classifying genera the site selection algorithm needed around 80% of the sites in the 16S gene before the classification error reached a minimum. This means that all variation, in both variable and conserved regions, is needed in order to separate genera.ConclusionsOur findings does not support the statement that the discriminative power of the 16S gene is located only in the variable regions. Variable regions are important, but just as many discriminative sites are found in the more conserved parts. The discriminative power is typically found in sites of large variation located inside shorter regions of higher conservation.

show abstract

“…This is a supervised learning method that has been used in many bioinformatics applications (e.g. [ 28 – 32 ]). A reason for the wide-spread use of PLS is that it is especially applicable when we have many correlated explanatory variables, which is typical for the present K -mer data, especially as K increases.…”

Section: Methodsmentioning

confidence: 99%

Comparing K-mer based methods for improved classification of 16S sequences

et al. 2015

Self Cite

View full text Add to dashboard Cite

BackgroundThe need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length.ResultsThe difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau.ConclusionsWe conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.

show abstract

Exploration of multivariate analysis in microbial coding sequence modeling

Cited by 8 publications

References 42 publications

Comparative metabolomics of muscle interstitium fluid in human trapezius myalgia: an in vivo microdialysis study

Comparative metabolomics of muscle interstitium fluid in human trapezius myalgia: an in vivo microdialysis study

A systematic search for discriminating sites in the 16S ribosomal RNA gene

Comparing K-mer based methods for improved classification of 16S sequences

Contact Info

Product

Resources

About