IntroductionTo explore a quick and non-invasive way to measure individual psychological states, this study developed interview-based scales, and multi-modal information was collected from 172 participants.MethodsWe developed the Interview Psychological Symptom Inventory (IPSI) which eventually retained 53 items with nine main factors. All of them performed well in terms of reliability and validity. We used optimized convolutional neural networks and original detection algorithms for the recognition of individual facial expressions and physical activity based on Russell's circumplex model and the five factor model.ResultsWe found that there was a significant correlation between the developed scale and the participants' scores on each factor in the Symptom Checklist-90 (SCL-90) and Big Five Inventory (BFI-2) [r = (−0.257, 0.632), p < 0.01]. Among the multi-modal data, the arousal of facial expressions was significantly correlated with the interval of validity (p < 0.01), valence was significantly correlated with IPSI and SCL-90, and physical activity was significantly correlated with gender, age, and factors of the scales.DiscussionOur research demonstrates that mental health can be monitored and assessed remotely by collecting and analyzing multimodal data from individuals captured by digital tools.