This study validates behavior development screening for toddlers (BeDevel), which utilizes a combination of short caregiver interviews (BeDevel-I) and semistructured play observations (BeDevel-P). The data of 431 toddlers (male 66.2%; mean age (SD) = 29.11 (8.59) months; ASD, n = 201; developmental delay, n = 46; typically developing, n = 184), aged 18~42 months, were included in the validation of BeDevel. The best clinical estimate diagnosis, screening rate, validity, sensitivity, and reliability of BeDevel were determined based on data cross-sectionally collected using BeDevel and existing diagnostic/screening instruments: autism diagnostic observation schedule (ADOS), autism diagnostic interview (ADI-R), Vineland adaptive behavior scales-II (VABS-II), social response scales (SRS), sequenced language scale for infants (SELSI), Korean childhood autism rating scale (K-CARS), and Korean social communication questionnaire (K-SCQ). The k values of BeDevel-I and BeDevel-P were 0.055~0.732 and 0.291~0.752, respectively. Items related to social referencing in BeDevel-P had a particularly high diagnostic validity (k = 0.483~0.684). Reliabilities of BeDevel-I and BeDevel-P were sufficient (Cronbach's alpha = 0.86~0.88 and 0.92~0.