2016
DOI: 10.1016/j.compbiomed.2016.06.004
|View full text |Cite
|
Sign up to set email alerts
|

Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes

Abstract: Handling of missed data is one of the main tasks in data preprocessing especially in large public service datasets. We have analysed data from the Trauma Audit and Research Network (TARN) database, the largest trauma database in Europe. For the analysis we used 165,559 trauma cases. Among them, there are 19,289 cases (11.35%) with unknown outcome. We have demonstrated that these outcomes are not missed 'completely at random' and, hence, it is impossible just to exclude these cases from analysis despite the lar… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(25 citation statements)
references
References 52 publications
0
25
0
Order By: Relevance
“…The purpose of this paper is to use this surgical cohort as an example to describe the methodology employed to create meaningful data from large healthcare databases, and discuss relevant data considerations. In particular, we will discuss the methods utilized to address many of the ongoing challenges associated with use of big data, including: 1) sourcing data; 2) organizing data for clinical relevance [ 9 ]; 3) coding in a meaningful and descriptive way [ 10 ]; 4) handling missing values [ 11 – 14 ]; 5) reporting outcomes; 6) assuring the clinical veracity of the data [ 10 , 15 ]; and 7) reducing risks of analytic errors [ 16 ].…”
Section: Introductionmentioning
confidence: 99%
“…The purpose of this paper is to use this surgical cohort as an example to describe the methodology employed to create meaningful data from large healthcare databases, and discuss relevant data considerations. In particular, we will discuss the methods utilized to address many of the ongoing challenges associated with use of big data, including: 1) sourcing data; 2) organizing data for clinical relevance [ 9 ]; 3) coding in a meaningful and descriptive way [ 10 ]; 4) handling missing values [ 11 – 14 ]; 5) reporting outcomes; 6) assuring the clinical veracity of the data [ 10 , 15 ]; and 7) reducing risks of analytic errors [ 16 ].…”
Section: Introductionmentioning
confidence: 99%
“…An additional concern, while dealing with big data from the database, is to handle the missing attributes. According to academics, the Markov model could be a solution to handle missing data [3].…”
Section: The State-of-artmentioning
confidence: 99%
“…Some advanced technologies, such as tensors, cloud computing, and some intellectual frameworks, are used to analyze big data [3][4][5][6][7][8]. Big data processing, however, leads to problems, as much semi-structured and unstructured information exists.…”
Section: Introductionmentioning
confidence: 99%
“…Data pre-processing is an important process to classify dataset accurately. In this context, Mirkes et al [21] proposed a modern approach to handle missing values and developed a system of Markov models for the handling of missing data and lost patients of Trauma Audit and Research Network (TARN). They have also imputed missing values with adjustment of weights on a dataset of patients of TARN.…”
Section: Literature Reviewmentioning
confidence: 99%