ObjectiveHIV incidence varies widely between sub-Saharan African (SSA) countries. This variation coincides with a substantial sociobehavioural heterogeneity, which complicates the design of effective interventions. In this study, we investigated how socio-behavioural heterogeneity in sub-Saharan Africa could account for the variance of HIV incidence between countries.MethodsWe used unsupervised machine learning to analyse data from the Demographic and Health Surveys of 29 SSA countries completed after 2010. We preselected 48 demographic, socio-economic, behavioural and HIV-related attributes to describe each country. We used Principle Component Analysis to visualize sociobehavioural similarity between countries, and to identify the variables that accounted for most sociobehavioural variance in SSA. We used hierarchical clustering to identify groups of countries with similar sociobehavioural profiles, and we compared the distribution of HIV incidence and sociobehavioural variables within each cluster.FindingsThe most important characteristics, which explained 69% of sociobehavioural variance across SSA among the variables we assessed were: religion; male circumcision; number of sexual partners; literacy; uptake of HIV testing; women’s empowerment; accepting attitude toward people living with HIV/AIDS; rurality; ART coverage; and, knowledge about AIDS. Our model revealed three groups of countries, each with characteristic sociobehavioural profiles. HIV incidence was mostly similar within each cluster and different between clusters (median(IQR); 0.5/1000(0.6/1000), 1.8/1000(1.3/1000) and 5.0/1000(4.2/1000)).ConclusionOur findings suggest that sociobehavioural factors play a key role in determining the course of the HIV epidemic, and that similar techniques can help to design and predict the effects of targeted country-specific interventions to impede HIV transmission.Research in contextKnowledge before this studyWe searched PubMed with the terms: “HIV”, “inequality”, “factors” and “sub-Saharan Africa” for articles published in English before February 28th, 2019. The reviewed literature was usually limited to a certain sub-population, sub-national region, or country; but some recent studies covered up to 31 sub-Saharan African countries. Based on a relatively small number of variable (5 to 13), and using descriptive statistics, regressions and concentration indices, previous works analysed the association between socio-economic inequalities, male circumcision, high-risk sexual behaviour, or HIV-related stigma, with HIV testing, uptake of treatment, ART adherence, or HIV prevalence.Contribution of this studyTo our knowledge, this is the first study where unsupervised machine learning techniques (Principle Component Analysis and hierarchical clustering) were used to analyse the sociobehavioural heterogeneity in sub-Saharan Africa (SSA) and how it associates with the variability of HIV incidence in the region. We identified three distinct sociobehavioural profiles, which were associated with different geographical regions and different levels of HIV incidence in SSA. Because the association between the variability of HIV incidence across SSA and its underlying sociobehavioural factors is still not well understood, we believe that our analysis that compares 29 SSA countries based on 48 sociobehavioural characteristics brings significant value to the field. Identifying and comparing sociobehavioural profiles of countries helps to design and predict the effect of tailored country-specific interventions to impede HIV transmission.