PurposeThe present study examines the fluctuations in Socioeconomic and demographic (SED) factors and the prevalence of Non-Communicable Diseases (NCDs) across clusters of states in India. Further, it attempts to analyze the extent to which the SED determinants can serve as predictive indicators for the prevalence of NCDs.Design/methodology/approachThe study uses three rounds of unit-level National Sample Survey self-reported morbidity data for the analysis. A machine learning model was constructed to predict the prevalence of NCDs based on SED characteristics. In addition, probit regression was adopted to identify the relevant SED variables across the cluster of states that significantly impact disease prevalence.FindingsOverall, the study finds that the disease prevalence can be reasonably predicted with a given set of SED characteristics. Also, it highlights age as the most important factor across a cluster of states in understanding the distribution of disease prevalence, followed by income, education, and marital status. Understanding these variations is essential for policymakers and public health officials to develop targeted strategies that address each state’s unique challenges and opportunities.Originality/valueThe study complements the existing literature on the interplay of SEDs with the prevalence of NCDs across diverse state-level dynamics. Its predictive analysis of NCD distribution through SED factors adds valuable depth to our understanding, making a notable contribution to the field.