In social interactions, people who are perceived as competent win more chances, tend to have more opportunities, and perform better in both personal and professional aspects of their lives. However, the process of evaluating competence is still poorly understood. To fill this gap, we developed a two-step empirical study to propose a competence evaluation framework and a predictor of individual competence based on multimodal data using machine learning and computer vision methods. In study 1, from a knowledge-driven perspective, we first proposed a competence evaluation framework composed of 4 inner traits (skill, expression efficiency, intelligence, and capability) and 6 outer traits (age, eye gaze variation, glasses, length-to-width ratio, vocal energy, and vocal variation). Then, eXtreme Gradient Boosting (XGBoost) and Shapley Additive exPlanations (SHAP) were utilized to predict and interpret individual competence, respectively. The results indicate that 8 (4 inner and 4 outer) traits (in descending order: vocal energy, age, length-to-width ratio, glasses, expression efficiency, capability, intelligence, and skill) contribute positively to competence evaluation, while 2 outer traits (vocal variation and eye gaze variation) contribute negatively. In study 2, from a data-driven perspective, we accurately predicted competence with a cutting-edge multimodal machine learning algorithm, low-rank multimodal fusion (LMF), which exploits the intra- and intermodal interactions among all the visual, vocal, and textual features of an individual’s competence behavior. The results indicate that vocal and visual features contribute most to competence evaluation. In addition, we provided a Chinese Competence Evaluation Multimodal Dataset (CH-CMD) for individual competence analysis. This paper provides a systemic competence framework with empirical consolidation and an effective multimodal machine learning method for competence evaluation, offering novel insights into the study of individual affective traits, quality, personality, etc.