screening is widely recommended, yet studies of the accuracy of commonly used questionnaires reveal mixed results, and previous comparisons of these questionnaires are hampered by important methodological differences across studies.OBJECTIVE To compare the accuracy of 3 developmental screening instruments as standardized tests of developmental status.
DESIGN, SETTING, AND PARTICIPANTSThis cross-sectional diagnostic accuracy study recruited consecutive parents in waiting rooms at 10 pediatric primary care offices in eastern Massachusetts between October 1, 2013, and January 31, 2017. Parents were included if they were sufficiently literate in the English or Spanish language to complete a packet of screening questionnaires and if their child was of eligible age. Parents completed all questionnaires in counterbalanced order. Participants who screened positive on any questionnaire plus 10% of those who screened negative on all questionnaires (chosen at random) were invited to complete developmental testing. Analyses were weighted for sampling and nonresponse and were conducted from October 1, 2013, to January 31, 2017. EXPOSURES The 3 screening instruments used were the Ages & Stages Questionnaire, Third Edition (ASQ-3); Parents' Evaluation of Developmental Status (PEDS); and Survey of Well-being of Young Children (SWYC): Milestones. MAIN OUTCOMES AND MEASURES Reference tests administered were Bayley Scales of Infant and Toddler Development, Third Edition, for children aged 0 to 42 months, and Differential Ability Scales, Second Edition, for older children. Age-standardized scores were used as indicators of mild (80-89), moderate (70-79), or severe (<70) delays.RESULTS A total of 1495 families of children aged 9 months to 5.5 years participated. The mean (SD) age of the children at enrollment was 2.6 (1.3) years, and 779 (52.1%) were male. Parent respondents were primarily female (1325 [88.7%]), with a mean (SD) age of 33.4 (6.3) years. Of the 20.5% to 29.0% of children with a positive score on each questionnaire, 35% to 60% also received a positive score on a second questionnaire, demonstrating moderate co-occurrence. Among younger children (<42 months), the specificity of the ASQ-3 (89.4%; 95% CI, 85.9%-92.1%) and SWYC Milestones (89.0%; 95% CI, 86.1%-91.4%) was higher than that of the PEDS (79.6%; 95% CI, 75.7%-83.1%; P < .001 and P = .002, respectively), but differences in sensitivity were not statistically significant. Among older children (43-66 months), specificity of the ASQ-3 (92.1%; 95% CI, 85.1%-95.9%) was higher than that of the SWYC Milestones (70.7%; 95% CI, 60.9%-78.8%) and the PEDS (73.7%; 95% CI, 64.3%-81.3%; P < .001), but sensitivity to mild delays of the SWYC Milestones (54.8%; 95% CI, 38.1%-70.4%) and of the PEDS (61.8%; 95% CI, 43.1%-77.5%) was higher than that of the ASQ-3 (23.5%; 95% CI, 9.0%-48.8%; P = .012 and P = .002, respectively). Sensitivity exceeded 70% only with respect to severe delays, with 73.7% (95% CI, 50.1%-88.6%) for the SWYC Milestones among younger children, 78.9% (95% CI, 55...