T-Recs: Stable Selection of Dynamically Formed Groups of Features With Application to Prediction of Clinical Outcomes

Huang, Grace; Tsamardinos, Ioannis; Raghu, Vineet K.; Kaminski, Naftali; Benos, Panayiotis

doi:10.1142/9789814644730_0041

Cited by 9 publications

(9 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These findings extended our univariate taxonomic analyses in that they identified the taxa that are directly linked to positive cultures (not simply correlated) and highlighted sequenced bacterial taxa as the strongest explanatory variables of culture positivity. To examine the ability of 16S taxonomic data alone to predict culture positivity, we used the Markov blanket around the culture-positivity variable as a feature selection method ( Huang et al, 2015 ). The taxonomy-based classifier yielded mean accuracy of 82.3% ( SD = 7%) ( Table 3 ), indicating proof-of-concept utility for use of sequencing in clinical practice for predicting culture results, if sequencing results were available real-time.…”

Section: Resultsmentioning

confidence: 99%

Respiratory Microbiome Profiling for Etiologic Diagnosis of Pneumonia in Mechanically Ventilated Patients

et al. 2018

Self Cite

View full text Add to dashboard Cite

Etiologic diagnosis of bacterial pneumonia relies on identification of causative pathogens by cultures, which require extended incubation periods and have limited sensitivity. Next-generation sequencing of microbial DNA directly from patient samples may improve diagnostic accuracy for guiding antibiotic prescriptions. In this study, we hypothesized that enhanced pathogen detection using sequencing can improve upon culture-based diagnosis and that certain sequencing profiles correlate with host response. We prospectively collected endotracheal aspirates and plasma within 72 h of intubation from patients with acute respiratory failure. We performed 16S rRNA gene sequencing to determine pathogen abundance in lung samples and measured plasma biomarkers to assess host responses to detected pathogens. Among 56 patients, 12 patients (21%) had positive respiratory cultures. Sequencing revealed lung communities with low diversity (p < 0.02) dominated by taxa (>50% relative abundance) corresponding to clinically isolated pathogens (concordance p = 0.009). Importantly, sequencing detected dominant pathogens in 20% of the culture-negative patients exposed to broad-spectrum empiric antibiotics. Regardless of culture results, pathogen dominance correlated with increased plasma markers of host injury (receptor of advanced glycation end-products-RAGE) and inflammation (interleukin-6, tumor necrosis factor receptor 1-TNFR1) (p < 0.05), compared to subjects without dominant pathogens in their lung communities. Machine-learning algorithms identified pathogen abundance by sequencing as the most informative predictor of culture positivity. Thus, enhanced detection of pathogenic bacteria by sequencing improves etiologic diagnosis of pneumonia, correlates with host responses, and offers substantial opportunity for individualized therapeutic targeting and antimicrobial stewardship. Clinical translation will require validation with rapid whole meta-genome sequencing approaches to guide real-time antibiotic prescriptions.

show abstract

Section: Resultsmentioning

confidence: 99%

Respiratory Microbiome Profiling for Etiologic Diagnosis of Pneumonia in Mechanically Ventilated Patients

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…Traditional univariate approaches for feature selection exist as well, but they also often operate on a single data type. In addition, due to the high dimensionality and co-linearity of biological data, markers selected by these standard feature selection algorithms can be unstable and lack biological relevance [ 2 ], a problem that has recently been addressed directly [ 3 ]. Many existing models that do integrate different data types make heavy use of prior knowledge [ 4 , 5 ] and as such are not easily extendable to clinical and other data that are not well studied.…”

Section: Introductionmentioning

confidence: 99%

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

et al. 2016

Self Cite

View full text Add to dashboard Cite

BackgroundMixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks.ResultsWe studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables.ConclusionsMGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions.

show abstract

“…En general, se busca identificar un subconjunto de variables predictoras que son relevantes con respecto a una tarea específica; por ejemplo, en regresión y clasificación se busca seleccionar y retener el subconjunto de variables predictoras con el más alto poder predictivo. Se han desarrollado algoritmos que generan múltiples conjuntos de variables equivalentes (Huang et al, 2014). El algoritmo Statistically Equivalent Signature, SES, (Tsamardinos et al, 2013) permite identificar múltiples subconjuntos de variables con rendimientos estadísticamente equivalentes.…”

Section: Naive Bayesunclassified

Redes bayesianas con algoritmos basados en restricciones, scores e híbridos aplicados al problema de clasificación

Vásquez

2019

An. cient. U.N.A.

View full text Add to dashboard Cite

show abstract

T-Recs: Stable Selection of Dynamically Formed Groups of Features With Application to Prediction of Clinical Outcomes

Cited by 9 publications

References 37 publications

Respiratory Microbiome Profiling for Etiologic Diagnosis of Pneumonia in Mechanically Ventilated Patients

Respiratory Microbiome Profiling for Etiologic Diagnosis of Pneumonia in Mechanically Ventilated Patients

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Redes bayesianas con algoritmos basados en restricciones, scores e híbridos aplicados al problema de clasificación

Contact Info

Product

Resources

About