2005
DOI: 10.1371/journal.pmed.0020267
|View full text |Cite
|
Sign up to set email alerts
|

Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities

Abstract: In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
217
0
8

Year Published

2007
2007
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 314 publications
(225 citation statements)
references
References 12 publications
(7 reference statements)
0
217
0
8
Order By: Relevance
“…The extent of the data-cleaning helps communicate the quality of the data and the analytic rigour of the investigators. 25 Finally, changing eligibility over time is a component of the recommended limitations, which should be addressed in the study discussion. This can occur when there is a shift in the coding structure or coding practice within the RCD source over time.…”
Section: Current Status Of Reporting Of Rcd Studies In Urologymentioning
confidence: 99%
“…The extent of the data-cleaning helps communicate the quality of the data and the analytic rigour of the investigators. 25 Finally, changing eligibility over time is a component of the recommended limitations, which should be addressed in the study discussion. This can occur when there is a shift in the coding structure or coding practice within the RCD source over time.…”
Section: Current Status Of Reporting Of Rcd Studies In Urologymentioning
confidence: 99%
“…6 -8 Types of errors included letter and number reversal (e.g., "sh" entered as "hs;" unintentional repeats or deletions of numbers, letters, or decimal points; extraneous characters; simple transcription and reading errors; data entered into the incorrect field; and skipping fields when data were available). 9,10 In this study, the name fields were most prone to noncorrectable errors. Long alphabetic fields such as names and addresses have previously been noted to have error rates 10 to 15 times higher than numeric fields.…”
Section: Discussionmentioning
confidence: 82%
“…It removes errors, reduces the chance of inaccuracies, and repairs problems that may occur within databases. 15 The data cleaning intervention was performed by a computer systems analyst and programmer from the Perth and Hills Division of General Practice. GPs and practice nurses were surveyed before the register cleansing took place, mid-point during the intervention and post intervention.…”
Section: Methodology Samplementioning
confidence: 99%