Big data" refers to a growing field of large database research. Administrative data, a subset of big data, includes information from insurance claims, electronic medical records, and registries that can be useful for investigating novel research questions. While its use provides salient advantages, potential researchers relying on big data would benefit from knowing about how these databases are coded, common errors they may encounter, and how to best use large data to address various research questions. In the first section of this paper, Dr. Nicholas A. Bedard addresses the four major pitfalls to avoid with diagnosis and procedure codes in administrative data. In the next section, Dr. Jeffrey N. Katz et al. focus on the strengths and limitations of administrative data, suggesting methods to mitigate these limitations. Lastly, Dr. Elena Losina et al. review the uses and misuses of large databases for cost-effectiveness research, detailing methods for careful economic evaluations.
Pitfalls of Coding in DatabasesU tilization of administrative claims data for the purposes of orthopaedic research and quality assessment has increased exponentially over the past decade [1][2][3][4][5] . Administrative claims data are most often associated with large insurance-based data sets, with data derived from billing records following the delivery of health-care services. However, administrative claims data are also utilized by clinical registries, such as the American Joint Replacement Registry (AJRR), to help provide important datapoints regarding surgical procedures, patient comorbidities, and postoperative complications.Administrative claims data can be an important resource, and many studies utilizing these data have led to improvements in patient care, health-care policy, and payment reform 6,7 . However, the primary purpose of Current Procedural Terminology (CPT) and International Classification of Diseases (ICD) codes is to allow providers to notify payers of health-care services provided so that they may be reimbursed for their services. CPT and ICD codes were not initially designed for research purposes and, as such, researchers must be aware of the unique characteristics of CPT and ICD codes to avoid the pitfalls that will be encountered when utilizing claims data for orthopaedic research. These pitfalls apply not only to researchers but also to clinical registries that utilize administrative claims to populate registry datapoints. Failure to recognize these pitfalls can impact the utility of administrative claims databases or clinical registries that rely on CPT, ICD-9 (Ninth Revision), and ICD-10 (Tenth Revision) codes.Disclosure: Research reported in this publication was supported in part by grants from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS): R01 AR074290, P30 AR072577. The funding source did not play any role in designing, conducting, or reporting this analysis. The Disclosure of Potential Conflicts of Interest forms are provided with the online version of the article (ht...