2015
DOI: 10.1016/j.aap.2015.06.014
|View full text |Cite
|
Sign up to set email alerts
|

A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms

Abstract: Public health surveillance programs in the U.S. are undergoing landmark changes with the availability of electronic health records and advancements in information technology. Injury narratives gathered from hospital records, workers compensation claims or national surveys can be very useful for identifying antecedents to injury or emerging risks. However, classifying narratives manually can become prohibitive for large datasets. The purpose of this study was to develop a human-machine system that could be rela… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0
2

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(19 citation statements)
references
References 20 publications
0
17
0
2
Order By: Relevance
“…The authors 26 found while the overall sensitivity of the two independent models was fairly good (0.67 naive sw , 0.65 naive seq ), both algorithms independently predicted some categories much better than others, skewing the final distribution of the coded data (χ² p<0.0001), and most of the cases in the smaller categories were not found. The sequence-word model showed improved performance where word order was important for differentiating causality.…”
Section: Bauer and Sector 34mentioning
confidence: 98%
See 3 more Smart Citations
“…The authors 26 found while the overall sensitivity of the two independent models was fairly good (0.67 naive sw , 0.65 naive seq ), both algorithms independently predicted some categories much better than others, skewing the final distribution of the coded data (χ² p<0.0001), and most of the cases in the smaller categories were not found. The sequence-word model showed improved performance where word order was important for differentiating causality.…”
Section: Bauer and Sector 34mentioning
confidence: 98%
“…A sample of WC claims incident narratives with BLS OIICS code assignments are shown below: The occurrence or probability of each word in each category (Pn j /C i ) was calculated as well as the marginal probability of each event category in the training data set (P(C i )); these are the two parameters necessary for the reduced Naïve Bayes algorithm. 26 These statistics calculated from the training narratives were stored in a probability table and used to train the Free text search of motor vehicle insurance claims database for 4 years to identify claims where road work occurring and key word categorisation of precrash activities and crash types through word frequency count and manual grouping of similar words to prepare keyword search strategy. Expanded to test a Bayesian modelling approach in second paper First paper identified number of incidents and categorised precrash activities and crash types to examine patterns of incidents.…”
Section: Case Studymentioning
confidence: 99%
See 2 more Smart Citations
“…In fact, within the field of coding injury narratives, Lehto et al (2009) and Marucci-Wellman et al (2011) have considered two-word (and longer) sequences in a separate model referred to as “Fuzzy Bayes.” Also, Grattan et al (2014) and Marucci-Wellman et al (2015) used two-word sequences within the Naïve Bayes framework, however, single-word and two-word sequences were used in separate models, not in a single model. Measure (2014) provides a more exhaustive investigation into which features optimize various auto-coder models and found that both the Naïve Bayes and logistic event auto-coders benefit from including single word and two-word features along with the North American Industry Classification System (NAICS) code of the employing establishment in a single model.…”
Section: Introductionmentioning
confidence: 99%