2017
DOI: 10.4097/kjae.2017.70.4.407
|View full text |Cite
|
Sign up to set email alerts
|

Statistical data preparation: management of missing values and outliers

Abstract: Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (e.g., the average and standard deviation of a sample), resulting in overestimated or underestimated v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
254
0
5

Year Published

2019
2019
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 445 publications
(259 citation statements)
references
References 7 publications
0
254
0
5
Order By: Relevance
“…The normality in frequency of the differences was also evaluated using the Shapiro-Wilk test to confirm appropriateness of the paired t -test. Outliers in the airborne spore catches and daily infection rate data were excluded for the Shapiro-Wilk test because outliers could bias the results of paired t -test (Kwak and Kim, 2017). Excluding two outliers on both extremes of data sets, 16 and 11 pairs of data were used in the evaluation of ASM and IRM, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…The normality in frequency of the differences was also evaluated using the Shapiro-Wilk test to confirm appropriateness of the paired t -test. Outliers in the airborne spore catches and daily infection rate data were excluded for the Shapiro-Wilk test because outliers could bias the results of paired t -test (Kwak and Kim, 2017). Excluding two outliers on both extremes of data sets, 16 and 11 pairs of data were used in the evaluation of ASM and IRM, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…The dataset had negligible missing data 27 and was therefore handled by available case analysis 28 using the IBM Statistical Package for the Social Sciences (SPSS) version 25.0 (IBM Inc, Armonk, NY, USA). All data was sex-disaggregated.…”
Section: Resultsmentioning
confidence: 99%
“…Quality control (QC) of phenotypic data is described in detail elsewhere (29). Generally, QC included winsorizing 25(OH)D to minimize the influence of outliers and using a log transformation to improve normality of 25(OH)D distribution in each cohort (32).…”
Section: Methodsmentioning
confidence: 99%