Objectives. To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so.Study Design and Setting. We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial ‘gestalt’ assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other’s assessment. Our primary analysis was graphical. We also calculated Cohen’s κ, Gwet’s agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement.Results. We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9–14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial ‘gestalt’ impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet’s AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of ‘not ill appearing’ were more reliable than others.Conclusion. The inter-rater reliability of emergency providers’ assessment of overall clinical appearance was adequate when described graphically and by Gwet’s AC. Different summary statistics yield different results for the same dataset.
Objectives To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and setting We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial ‘gestalt’ assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other’s assessment. Our primary analysis was graphical. We also calculated Cohen’s κ, Gwet’s agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial ‘gestalt’ impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet’s AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of ‘not ill appearing’ were more reliable than others. Conclusion The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet’s AC. Different summary statistics yield different results for the same dataset.
P e e r J P r e P r i n t s | h t t p : / / d x . d o i . o r g / 1 0 . 7 2 8 7 / p e e r j . p r e p r i n t s . PrePrints s t a t i s t i c s y i e l d d i ff e r e n t r e s u l t s f o r t h e s a me d a t a s e t .P e e r J P r e P r i n t s | h t t p : / / d x . d o i . o r g / 1 0 . 7 2 8 7 / p e e r j . p r e p r i n t s . PrePrintsTitle: Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the Emergency Department. Abstract ObjectivesTo measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and settingWe performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen's κ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. ResultsWe analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. ConclusionThe inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset. 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 P e e r J P r e P r i n t s | h t t p : / / d x . d o i . o r g / 1 0 . 7 2 8 7 / p e e r j . p r e p r i n t s . PrePrints INTRODUCTIONDeciding whether a febrile child is 'ill appearing' is a key decision point in emergency department (ED) management algorithms for febrile infants and toddlers. (Baker et al. 1993, Baraff et al. 1993, Jaskiewicz et al. 1994, Baskin et al. 1992) Initial physician judgments of this overall appearance are generally made rapidly and prior to completing a full physical examination. Such judgments can even affect how providers interpret clinica...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.