Traditionally, children's emotion has been assessed by teachers according to observation. We should be able to detect children's emotions using algorithmic techniques. To achieve this goal, it is necessary to develop and evaluate an emotional lexicon based on the standardized test entitled Emotional Competencies Scale for Young Children (ECSYC). The purpose of this study was to establish the criterion-related validity. The methodology of this study was to firstly develop 40 scenarios based on ECSYC. Secondly, we developed the five-level criteria. Thirdly, this study implemented observer training and calculated inter-rater consistency reliability. Fourthly, observers categorized 200 children's replies into five levels. Fifthly, this study ranked the sequence of frequency of each level and completed the emotional lexicon. The findings showed that the Spearman's rho coefficient reached up to .406*. (p = .026), which is significant, indicating that Young Children Emotional Lexicon (YCEL) and ECSYC were significantly correlated. The accuracies of the emotion detection recognizer using a bimodal emotion recognition approach achieved 46.7%, 60.85% and 78.73% for facial expression recognition, speech recognition, and a bimodal emotion recognition, respectively. Findings confirmed that the YCEL is feasible for speech recognition. The bimodal emotion recognition accuracies increased 32.03% and 17.88% compared with using a single modal of facial expression recognition and speech recognition, respectively. It is feasible to automatically detect children's emotional development and bring the norm up to date.