IntroductionTo date, the reliability of pubertal development self-assessment tools is questioned, and very few studies have explored the comparison between these tools in longitudinal studies. Hence, this study aimed to examine the reliability of pubertal development self-assessment using realistic color images (RCIs) and the Pubertal Development Scale (PDS) in a longitudinal cohort study.MethodsOur longitudinal study recruited 1,429 participants (695 boys and 734 girls), aged 5.8–12.2 years old, in Chongqing, China. We conducted two surveys, 6 months apart. Tanner stages were examined by trained medical students at each visit. RCIs and PDS scores were used to self-assess puberty at each visit. Agreement between physical examination and self-assessment was determined using weighted kappa (wk), accuracy, and Kendall rank correlation.ResultsThe concordance of puberty self-assessment using RCIs at baseline and the first follow-up was almost perfect in girls and boys, wk >0.800 (p < 0.001). At baseline, the concordance of genital development self-assessment using RCIs was fair in boys, wk = 0.285 (p < 0.001), and that of boys’ pubic hair development self-assessment using RCIs was poor, wk = 0.311 [95% confidence interval (CI) −0.157 to 0.818]. The wk of the PDS was less than 0.300, except for breast development. The reliability and validity of the PDS in this study population were low, and the consistency of the PDS was not good.ConclusionsThe concordance of RCIs is better than that of the PDS. Pubertal development self-assessment using RCIs is reliable, while the reliability and validity of the PDS are unacceptable. Therefore, RCIs are recommended as a reliable pubertal development self-assessment tool to measure pubertal development for large-scale epidemiological investigations and long-term longitudinal studies in China.