Feature selection and prediction with a Markov blanket structure learning algorithm

Tan, Yuanzhen; Liu, Zhifa

doi:10.1186/1471-2105-14-s17-a3

Cited by 6 publications

(5 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the most popular penalty functions is LASSO [12][13][14][15][16][17][18]. It forces most of the unimportant genes' regression coefficients into zero.…”

Section: Penalized Logistic Regression Methodsmentioning

confidence: 99%

“…Filtering methods, which reduce dimensionality and try to retain the most promising features as possible, have long been under development. A number of filtering methods has been proposed to rank features, such as Information gain [13], Markov blanket [14], Bayesian variable selection [15], Boruta [16], Fisher score [17], Relief [18], maximum relevance and minimum redundancy (MRMR) [19], marginal maximum likelihood score (MMLs) [20], among which MMLS is one of the simplest and computationally efficient methods of feature selection with some criteria.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data

Kim

2019

Mathematics

View full text Add to dashboard Cite

Over the last decade, high dimensional data have been popularly paid attention to in bioinformatics. These data increase the likelihood of detecting the most promising novel information. However, there are limitations of high-performance computing and overfitting issues. To overcome the issues, alternative strategies need to be explored for the detection of true important features. A two-stage approach, filtering and variable selection steps, has been receiving attention. Filtering methods are divided into two categories of individual ranking and feature subset selection methods. Both have issues with the lack of consideration for joint correlation among features and computing time of an NP-hard problem. Therefore, we proposed a new filter ranking method (PF) using the elastic net penalty with sure independence screening (SIS) based on resampling technique to overcome these issues. We demonstrated that SIS-LASSO, SIS-MCP, and SIS-SCAD with the proposed filtering method achieved superior performance of not only accuracy, AUROC, and geometric mean but also true positive detection compared to those with the marginal maximum likelihood ranking method (MMLR) through extensive simulation studies. In addition, we applied it in a real application of colon and lung cancer gene expression data to investigate the classification performance and power of detecting true genes associated with colon and lung cancer.

show abstract

“…One of the most popular penalty functions is LASSO [12][13][14][15][16][17][18]. It forces most of the unimportant genes' regression coefficients into zero.…”

Section: Penalized Logistic Regression Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data

Kim

2019

Mathematics

View full text Add to dashboard Cite

show abstract

“…Therefore, this concept can be used to select a smaller set of relevant genes in high-dimensional problems. The MB is proven to be highly effective for feature reduction in high-dimensional problems, sometimes reducing the number of variables a thousandfold without any loss of accuracy (Aliferis, Tsamardinos, et al 2003, Shen, Li et al 2008, Fu and Desmarais 2010, Tan and Liu 2013. We put forward this idea that the MB establishment is also instrumental in creating gene module networks.…”

Section: Resultsmentioning

confidence: 98%

Simple Bayesian Gene Network Learning in Populus Draught Transcriptome Data

Yaghuti

Movahedi

Wei

et al. 2021

Preprint

View full text Add to dashboard Cite

Populus is not only important for wood-based products, such as paper and timber, but also for metabolic-based production, for instance, bioethanol and biofuels. Constructing a sensibly functional gene interaction network is highly appealing to better understand system-level biological processes governing various Populus traits. Bayesian network learning provides an elegant and compact statistical approach for modeling causal gene-gene relationships in microarray data. Therefore, it could come with the illumination of functional molecular playing in Biology Systems. In this study, different forms of gene Bayesian networks were learned on Populus cellular transcriptome data. We addressed that Markov blankets, separating genes external to a regulatory Bayesian network from its internal genes, would likely be emerging at every possible gene regulatory Bayesian network level. The results have also shown that PtpAffx.1257.4.S1_a_at,1.0 hypothetical protein is the most important in its possible regulatory program. This paper illustrates that the gene network regulatory inference is possible to encapsulate within a single BN model. Therefore, such a BN model can serve as a promising training tool for Populus gene expression data to better prepare future experimental scenarios.

show abstract

“…Our approach’s central basis was that protein sets which are crucial in distinguishing disease states may be key biological drivers of the disease ( 32 ). We developed a novel ML methodology that employs auxiliary Markov blanket feature selection ( 77, 78 ) combined with multiple recursive feature selection algorithms to mitigate bias towards any specific algorithm ( 79 ) and reduce overfitting, which is the fundamental challenge considering the inherent low sample size and high dimensionality of our, and many others, proteomics datasets. The first step of our method was the creation of Leave-One-Out (LOO) partitions of our data ( 35 ).…”

Section: Methodsmentioning

confidence: 99%

“…The suite of algorithms employed included RFE with Logistic Regression (LR) with L1 and L2 regularization penalties, respectively ( 30, 31 ), RFE with regularized Linear Discriminant Analysis (rLDA) ( 80 ), RFE with Random Forests (RF) ( 29 ), Boruta - Random Forests ( 81 ), and Maximum-Relevance-Minimum-Redundancy (MRMR) with an F-Statistic evaluator ( 82 ). Markov blanket feature selection was employed separately on the original datasets, due to computational expense and subsequently incorporated during the later aggregation steps ( 77, 78 ).…”

Section: Methodsmentioning

confidence: 99%

Plasma Proteomics of Genetic Brain Arteriosclerosis and Dementia Syndrome Identifies Signatures of Fibrosis, Angiogenesis, and Metabolic Alterations

Keller,

Radabaugh,

Karvelas

et al. 2024

Preprint

View full text Add to dashboard Cite

Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is the most common monogenic form of vascular cognitive impairment and dementia. A genetic arteriolosclerotic disease, the molecular mechanisms driving vascular brain degeneration and decline remain unclear. With the goal of driving discovery of disease-relevant biological perturbations in CADASIL, we used machine learning approaches to extract proteomic disease signatures from large-scale proteomics generated from plasma collected from three distinct cohorts in US and Colombia: CADASIL-Early (N = 53), CADASIL-Late (N = 45), and CADASIL-Colombia (N = 71). We extracted molecular signatures with high predictive value for early and late-stage CADASIL and performed robust cross- and external-validation. We examined the biological and clinical relevance of our findings through pathway enrichment analysis and testing of associations with clinical outcomes. Our study represents a model for unbiased discovery of molecular signatures and disease biomarkers, combining non-invasive plasma proteomics with clinical data. We report on novel disease-associated molecular signatures for CADASIL, derived from the accessible plasma proteome, with relevance to vascular cognitive impairment and dementia.

show abstract

Feature selection and prediction with a Markov blanket structure learning algorithm

Cited by 6 publications

References 3 publications

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data

Simple Bayesian Gene Network Learning in Populus Draught Transcriptome Data

Plasma Proteomics of Genetic Brain Arteriosclerosis and Dementia Syndrome Identifies Signatures of Fibrosis, Angiogenesis, and Metabolic Alterations

Contact Info

Product

Resources

About