2017
DOI: 10.1140/epjds/s13688-017-0099-3
|View full text |Cite|
|
Sign up to set email alerts
|

Improving official statistics in emerging markets using machine learning and mobile phone data

Abstract: Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
32
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(34 citation statements)
references
References 34 publications
1
32
0
1
Order By: Relevance
“…This accuracy is close to the estimate of the upper bound of that of the Bayes optimal classifier, indicating that the performance is close to optimal. In contrast, the standard regression techniques performed poorly in predicting the numerical age, similar to what was observed in egocentric prediction studies [19]. The classifier performance is generally good among peers, with those in the Y, L and O age groups at 77% accuracy and above.…”
Section: Discussionsupporting
confidence: 71%
“…This accuracy is close to the estimate of the upper bound of that of the Bayes optimal classifier, indicating that the performance is close to optimal. In contrast, the standard regression techniques performed poorly in predicting the numerical age, similar to what was observed in egocentric prediction studies [19]. The classifier performance is generally good among peers, with those in the Y, L and O age groups at 77% accuracy and above.…”
Section: Discussionsupporting
confidence: 71%
“…Martinez et al used small size of reliable data set, with 10,000 users, and by using multiple algorithms (SVM, Random Forest and K-means) the accuracy obtained was 80% when the percentage of predicted instances was reduced [23]. In [1] bandicoot 1 tool is used to extract more than 1400 behavioral features, with different categories, and tested those features with different algorithms such as random forest, SVM, KNN, and the accuracy of the model was 79.7% at best for predicting the gender at developing countries as in South Asia.…”
Section: Related Workmentioning
confidence: 99%
“…
IntroductionNowadays, the mobile phone is one of the fastest growing technologies in the developing world with global penetration rates reaching 90% [1]. This makes it a huge warehouse for customer's data.
…”
mentioning
confidence: 99%
“…The current state of the art in predicting demographics from mobile phone data is a recent paper by Jahani et al [10] which relies on a large number of handengineered features (1440) provided by the open-source bandicoot toolbox [16] and a carefully tuned SVM with a radial basis function kernel. The features used are divided into two categories (individual, spatial) and based on carefully engineered definitions such as how to group together calls and text messages into conversations or compute the churn rate of common locations.…”
Section: Related Workmentioning
confidence: 99%