Sociocultural, economic, geographical and other heterogeneity of states in a cross-territorial sense is often politicized and results in the rise of self-determination movements and political-territorial conflicts. In search of an answer to the question of which ways of managing political-territorial diversity are more effective, quantitative large-N comparative studies are being conducted by many scholars. These studies require the systematization and integration of large amounts of information on various aspects of the problem: the characteristics of conflicts, movements for self-determination, state policies on conflict management. This ambitious task is not yet fully solved, and the paper presents an overview of several databases that can become the basis for such integration, showing their strengths and limitations, opportunities and difficulties of integration. The main problem with this approach is likely to be that different data sets are based on different units of observation: conflicts, ethnic groups, countries, separate regions, etc. Moreover, even with the same units of observation, dissimilar criteria are used to include them in the dataset. In some datasets, there is a selection bias, which also impedes data integration.