Predicting miRNA-disease associations using an ensemble learning framework with resampling method

Dai, Qigen; Wang, Zhaowei; Liu, Ziqiang; Duan, Xiaodong; Song, Jing; Guo, Maozu

doi:10.1093/bib/bbab543

Cited by 32 publications

(20 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To prove the ability of the CSMDA to predict potential disease-associated miRNAs, we compared it with six state-of-the-art MDA prediction models, including ABMDA ( Zhao et al, 2019 ), ANMDA ( Chen et al, 2021 ), GAEMDA ( Li et al, 2021 ), GBDT-LR ( Zhou et al, 2020 ), IRFMDA ( Yao et al, 2019 ) and ERMDA ( Dai et al, 2022 ). First, the CSMDA and other MDA prediction models constructed negative sample set by their respective methods.…”

Section: Resultsmentioning

confidence: 99%

“…Since there may be no functional similarity between two miRNAs, we integrated the miRNA functional similarity and the GIPK similarity of miRNA

and

. Inspired by previous works ( Dai et al, 2022 ), the integrated miRNA similarity between

and

was defined as follows:

…”

Section: Methodsmentioning

confidence: 99%

“…In this work, the 5,430 experimentally confirmed miRNA-disease associations were taken as positive samples and the 184,155 unverified miRNA-disease pairs as unlabeled samples. Most methods ( Yao et al, 2019 ; Zhao et al, 2019 ; Zhou et al, 2020 ; Chen et al, 2021 ; Li et al, 2021 ; Dai et al, 2022 ) of constructing negative sample set are to randomly select some unlabeled samples as negative samples, or apply k-means clustering on the unlabeled samples and sample negative examples from the resulted clusters. However, these methods may introduce potential positive samples into negative sample set and lead to the performance degradation of the trained model ( Chen et al, 2021 ).…”

Section: Methodsmentioning

confidence: 99%

“…In the CSMDA, each miRNA-disease feature vector has 878 dimensions, which may contain a large amount of noise and redundant information. Inspired by previous research ( Yao et al, 2019 ; Dai et al, 2022 ), we performed feature selection based on random forest variable importance score on each training subset. First, we trained a random forest model on each training subset and sorted all features by the variable importance scores which were generated by the random forest.…”

Section: Methodsmentioning

confidence: 99%

“…Chen et al proposed an anti-noise miRNA-disease association prediction algorithm (ANMDA) which applied the k-means algorithm to cluster the unlabeled samples and selected negative samples equally from each cluster to reduce the noise ( Chen et al, 2021 ). Dai et al presented a resampling-based ensemble framework (ERMDA) which constructed multiple balanced training subsets by resampling and obtained the final prediction result by soft voting strategy ( Dai et al, 2022 ). Liu et al proposed a new novel method via deep forest ensemble learning based on autoencoder (DFELMDA) to predict miRNA-disease associations ( Liu et al, 2022 ).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A clustering-based sampling method for miRNA-disease association prediction

et al. 2022

View full text Add to dashboard Cite

More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.

show abstract