Over 50% of the world population is at risk of mosquito-borne diseases. Female Ae. aegypti mosquito species transmit Zika, Dengue, and Chikungunya. The spread of these diseases correlate positively with the vector population, and this population depends on biotic and abiotic environmental factors including temperature, vegetation condition, humidity and precipitation. To combat virus outbreaks, information about vector population is required. To this aim, Earth observation (EO) data provide fast, efficient and economically viable means to estimate environmental features of interest. In this work, we present a temporal distribution model for adult female Ae. aegypti mosquitoes based on the joint use of the Normalized Difference Vegetation Index, the Normalized Difference Water Index, the Land Surface Temperature (both at day and night time), along with the precipitation information, extracted from EO data. The model was applied separately to data obtained during three different vector control and field data collection condition regimes, and used to explain the differences in environmental variable contributions across these regimes. To this aim, a random forest (RF) regression technique and its nonlinear features importance ranking based on mean decrease impurity (MDI) were implemented. To prove the robustness of the proposed model, other machine learning techniques, including support vector regression, decision trees and k-nearest neighbor regression, as well as artificial neural networks, and statistical models such as the linear regression model and generalized linear model were also considered. Our results show that machine learning techniques perform better than linear statistical models for the task at hand, and RF performs best. By ranking the importance of all features based on MDI in RF and selecting the subset comprising the most arXiv:1911.08979v2 [q-bio.PE] 26 Nov 2019 A PREPRINT -NOVEMBER 28, 2019 informative ones, a more parsimonious but equally effective and explainable model can be obtained. Moreover, the results can be empirically interpreted for use in vector control activities.Keywords Ae. aegypti · Machine learning · Random forest · Remote sensing
IntroductionStudying urban ecosystems is a hot topic within various scientific clusters [1][2][3][4][5]. The compounding effects of human activities through urbanization, carbon emission and other biodiversity changes are modifying urban ecosystems, thus creating a need for continuous assessment of the environment to ensure health and quality of life suitability for dwellers [6]. With a repository covering over 40 years of data collected, Earth observation (EO) data from orbiting satellites present a gold mine for environmental change monitoring [7]. Due to recent advances resulting in higher resolution sensors (spectral, spatial and temporal), freely accessible data, and efficient processing algorithms, it is possible to extract static and dynamic phenomena happening on the Earth surface and use this information for various environmental characteriza...