ObjectiveThis study aimed to propose a data-driven framework for classification of at-risk people for cardiovascular outcomes regarding obesity and metabolic syndrome.DesignA population-based prospective cohort study with a long-term follow-up.SettingData from the Tehran Lipid and Glucose Study (TLGS) were interrogated.Participants12 808 participants of the TLGS cohort, aged ≥20 years who have followed for over 15 years were assessed.Main outcome measuresData for 12 808 participants, aged ≥20 years who have followed for over 15 years, collected through TLGS as a prospective, population-based cohort study, were analysed. Feature engineering followed by hierarchical clustering was used to determine meaningful clusters and novel endophenotypes. Cox regression was used to demonstrate the clinical validity of phenomapping. The performance of endophenotype compared with traditional classifications was evaluated by the value of Akaike information criterion/Bayesian information criterion. R software V.4.2 was employed.ResultsThe mean age was 42.1±14.9 years, 56.2% were female, 13.1%, 2.8% and 6.2% had experienced cardiovascular disease (CVD), CVD mortality and hard CVD, respectively. Low-risk cluster compared with the high risk had significant difference in age, body mass index, waist-to-hip ratio, 2 hours post load plasma glucose, triglyceride, triglycerides to high density lipoprotein ratio, education, marital status, smoking and the presence of metabolic syndrome. Eight distinct endophenotypes were detected with significantly different clinical characteristics and outcomes.ConclusionPhenomapping resulted in a novel classification of population with cardiovascular outcomes, which can, better, stratify individuals into homogeneous subclasses for prevention and intervention as an alternative of traditional methods solely based on either obesity or metabolic status. These findings have important clinical implications for a particular part of the Middle Eastern population for which it is a common practice to use tools/evidence derived from western populations with substantially different backgrounds and risk profiles.