2020
DOI: 10.21203/rs.3.rs-32456/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The K nearest neighbor algorithm for imputation of missing longitudinal prenatal alcohol data

Abstract: Background — Missing data are a source of bias in many epidemiologic studies. This is problematic in alcohol research where data missingness may not be random as they depend on patterns of drinking behavior. Methods — The Safe Passage Study was a prospective investigation of prenatal alcohol consumption and fetal/infant outcomes (n=11,083). Daily alcohol consumption for the last reported drinking day and 30 days prior was recorded using the Timeline Followback method. Of 3.2 million person-days, data were miss… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…As a result of the methodology for the collection of self-report exposure information, a given participant could potentially have single or multiple segments of missing information on alcohol or smoking consumption. For this reason, we imputed missing daily exposure data using a K-Nearest Neighbor approach ( Sania et al, 2020 ). Further information can be found in the Supplementary Material 1 .…”
Section: Methodsmentioning
confidence: 99%
“…As a result of the methodology for the collection of self-report exposure information, a given participant could potentially have single or multiple segments of missing information on alcohol or smoking consumption. For this reason, we imputed missing daily exposure data using a K-Nearest Neighbor approach ( Sania et al, 2020 ). Further information can be found in the Supplementary Material 1 .…”
Section: Methodsmentioning
confidence: 99%
“…The study design did not allow consumption data to be collected on every single day of pregnancy, so missing values were imputed using the k-nearest neighbor (kNN) method. Methods for alcohol imputation are cited elsewhere ( 57 ). Since frequency of cigarette use was collected more sparsely, average cigarettes smoked per week during pregnancy was used.…”
Section: Methodsmentioning
confidence: 99%
“…For the target variable, class imbalance was corrected by performing a simple bootstrapping technique which consisted of oversampling the minority category ('VL Suppressed' = No; number of samples = 8392; random state = 5). Missing values imputation was performed with K-NN (K = 05) which was chosen for its capability to produce estimations close to the reality and preserve the associations in the dataset [20]. The training and the test sets were pre-processed separately to prevent information leakage from the training to the test set, and bootstrapping was performed on the training set only [21].…”
Section: Study Datasetmentioning
confidence: 99%