Poverty is one of the most important determinants of adverse health outcomes globally, a major cause of societal instability and one of the largest causes of lost human potential. Traditional approaches to measuring and targeting poverty rely heavily on census data, which in most low- and middle-income countries (LMICs) are unavailable or out-of-date. Alternate measures are needed to complement and update estimates between censuses. This study demonstrates how public and private data sources that are commonly available for LMICs can be used to provide novel insight into the spatial distribution of poverty. We evaluate the relative value of modelling three traditional poverty measures using aggregate data from mobile operators and widely available geospatial data. Taken together, models combining these data sources provide the best predictive power (highest r2 = 0.78) and lowest error, but generally models employing mobile data only yield comparable results, offering the potential to measure poverty more frequently and at finer granularity. Stratifying models into urban and rural areas highlights the advantage of using mobile data in urban areas and different data in different contexts. The findings indicate the possibility to estimate and continually monitor poverty rates at high spatial resolution in countries with limited capacity to support traditional methods of data collection.
Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual's gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. Mobile phone data has a great potential for good and our framework allows this data to be augmented with vulnerability and other information at a fraction of the cost.
Deep learning has in recent years brought breakthroughs in several domains, most notably voice and image recognition. In this work we extend deep learning into a new application domain-namely classification on mobile phone datasets. Classic machine learning methods have produced good results in telecom prediction tasks, but are underutilized due to resource-intensive and domain-specific feature engineering. Moreover, traditional machine learning algorithms require separate feature engineering in different countries. In this work, we show how socioeconomic status in large de-identified mobile phone datasets can be accurately classified using deep learning, thus avoiding the cumbersome and manual feature engineering process. We implement a simple deep learning architecture and compare it with traditional data mining models as our benchmarks. On average our model achieves 77% AUC on test data using location traces as the sole input. In contrast, the benchmarked state-of-the-art data mining models include various feature categories such as basic phone usage, top-up pattern, handset type, social network structure and individual mobility. The traditional machine learning models achieve 72% AUC in the best-case scenario. We believe these results are encouraging since average regional household income is an important input to a wide range of economic policies. In underdeveloped countries reliable statistics of income is often lacking, not frequently updated, and is rarely fine-grained to sub-regions of the country. Making income prediction simpler and more efficient can be of great help to policy makers and charity organizations-which will ultimately benefit the poor.
Diffusion processes are central to human interactions. One common prediction of the current modeling frameworks is that initial spreading dynamics follow exponential growth.Here, we find that, ranging from mobile handsets to automobiles, from smart-phone apps to scientific fields, early growth patterns follow a power law with non-integer exponents. We test the hypothesis that mechanisms specific to substitution dynamics may play a role, by analyzing a unique data tracing 3.6M individuals substituting for different mobile handsets. We uncover three generic ingredients governing substitutions, allowing us to develop a minimal substitution model, which not only explains the power-law growth, but also collapses diverse growth trajectories of individual constituents into a single curve. These results offer a mechanistic understanding of power-law early growth patterns emerging from various domains and demonstrate that substitution dynamics are governed by robust self-organizing principles that go beyond the particulars of individual systems.2 Diffusion processes impact broad aspects of human society 1-5 , ranging from the spread of biological viruses 3, 6-8 to the adoption of innovations 4,[9][10][11][12][13][14] and knowledge 15, 16 and to the spread of information [17][18][19] , cultural norms and social behavior [20][21][22][23] . Despite numerous studies that span multiple disciplines, our knowledge is mainly limited to spreading processes in non-substitutive systems. Yet, a considerable number of ideas, products and behaviors spread by substitution-to adopt a new one, agents often need to give up an existing one. For example, the development of science hinges on scientists' relentlessness in abandoning a scientific framework once one that offers a better description of reality emerges 24 . The same is true for adopting a new healthy habit or other durable items, like mobile phones, cars or homes.While substitutions play a key role from science to economy, our limited understanding of such processes stems from the lack of empirical data tracing their characteristics. To study the dynamics of substitutions, we explore growth patterns in four different substitutive systems where detailed dynamical patterns are captured with fine temporal resolution (See Supplementary Note 1 for detailed data descriptions). Our first dataset captures, with daily resolution, 3.6 Million individuals choosing among different types of mobile handsets, recorded by a Northern European telecommunication company from January 2006 to November 2014. Since an individual is unlikely to keep more than one mobile phone at a time, his or her adoption of a new handset is typically associated with discontinuance of the old one. Here, we focus on handsets that have been released for at least 6 months and used by at least 50 users in total (885 different handset models). Our second dataset captures monthly transaction records of 126 automobiles sold in the North America between 2010 and 2016. These automobiles have been released for at least four mont...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.