Background: This paper shows that Big Data and the so-called tools of digital demography, such as Google Trends (GT) and insights from social networks such as Instagram, Twitter and Facebook, can be useful for determining, estimating, and predicting the forced migration flows to the EU caused by the war in Ukraine.
Objective: The objective of this study was to test the usefulness of Google Trends indexes to predict further forced migration from Ukraine to the EU (mainly to Germany) and gain demographic insights from social networks into the age and gender structure of refugees.
Methods: The primary methodological concept of our approach is to monitor the digital trace of Internet searches in Ukrainian, Russian and English with the Google Trends analytical tool (trends.google.com). Initially, keywords were chosen that are most predictive, specific, and common enough to predict the forced migration from Ukraine. We requested the data before and during the war outbreak and divided the keyword frequency for each migration-related query to standardise the data. We compared this search frequency index with official statistics from UNHCR to prove the significations of results and correlations and test the models predictive potential. Since UNHCR does not yet have complete data on the demographic structure of refugees, to fill this gap, we used three other alternative Big Data sources: Facebook, Twitter and Instagram.
Results: All tested migration-related search queries about emigration planning from Ukraine show the positive linear association between Google index and data from official UNHCR statistics; R2 = 0.1211 for searches in Russian and R2 = 0.1831 for searches in Ukrainian. It is noticed that Ukrainians use the Russian language more often to search for terms than Ukrainian. Increase in migration-related search activities in Ukraine such as граница (Rus. border), кордону (Ukr. border); Польща (Poland); Германия (Rus. Germany), Німеччина (Ukr. Germany) and Угорщина and Венгрия (Hungary) correlate strongly with officially UNHCR data for externally displaced persons from Ukraine. All three languages show that the interest in Poland is the highest. When refugees arrive in nearby countries, the search for terms related to Germany, such as crossing the border + Germany, etc., is proliferating. This result confirms our hypothesis that one-third of all refugees will cross into Germany. According to Big Data insights, the estimate of the total number of expected refugees is to expect 5,4 Million refugees. The age group most represented is between 24 and 45 years (data for children are unavailable), and over 65% are women.
Conclusion: The increase in migration-related search queries is correlated with the rise in the number of refugees from Ukraine in the EU. Thus this method allows reliable forecasts. Understanding the consequences of forced migration from Ukraine is crucial to enabling UNHCR and governments to develop optimal humanitarian strategies and prepare for refugee reception and possible integration. The benefit of this method is reliable estimates and forecasting that can allow governments and UNHCR to prepare and better respond to the recent humanitarian crisis.