2017
DOI: 10.1101/207506
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues

Abstract: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Here we describe BoostMe, a method for imputing low quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS o… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 124 publications
(232 reference statements)
0
4
0
Order By: Relevance
“…Imputation can be used to estimate the methylation value for each missing CpG site 15, 16 and thereby, can be used to help increase platform intra- and interoperability. We used BoostMe 16 to impute the vast majority (i.e. 99.95%) of missing CpG sites, leveraging information learnt from just the neighboring CpG within the same dataset.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Imputation can be used to estimate the methylation value for each missing CpG site 15, 16 and thereby, can be used to help increase platform intra- and interoperability. We used BoostMe 16 to impute the vast majority (i.e. 99.95%) of missing CpG sites, leveraging information learnt from just the neighboring CpG within the same dataset.…”
Section: Resultsmentioning
confidence: 99%
“…We used BoostMe 16 to impute the vast majority (i.e. 99.95%) of missing CpG sites, leveraging information learnt from just the neighboring CpG within the same dataset.…”
Section: Methodsmentioning
confidence: 99%
“…For example, variational autoencoders have been successfully applied for dimensionality reduction of methylation data [8]. Other methods focus on imputation of single CpG sites in tissue samples using, among others, linear regression [9], Random Forests [10], autoencoders [11], gradient boosting [12] or mixture models [13]. In addition to using intrasample dependencies between neighboring CpG sites, some of these methods adopt the idea of leveraging information from multiple (tissue) samples for prediction [12, 13].…”
Section: Introductionmentioning
confidence: 99%
“…The technique was found to outperform other machine learning and deep learning techniques in many competitions such as Kaggle and KDDCup (Chen and Guestrin, 2016;Dhaliwal, et al, 2018), especially for datasets with sparse matrix. It has been successfully applied in many bioinformatic studies, such as miRNA-disease association (Chen, et al, 2018), protein translocation (Mendik, et al, 2019), protein-protein interactions (Basit, et al, 2018), and DNA methylation (Zou, et al, 2018).…”
Section: Introductionmentioning
confidence: 99%