Over the last decade, India has witnessed an explosion in the ecommerce industry. There is increasing adoption of e-commerce in smaller towns and cities over and above the densely populated urban centers. In this paper, we discuss the practical challenges involved with developing high-precision geocoding engines for these geographical regions in India. These challenges motivate the next iteration of our geocoding framework. In particular, we focus on addressing three core areas of improvement: 1) leveraging customer delivery data for geocoding, 2) understanding and solving for the diversity and variations in addresses for these new regions, and 3) overcoming the limited coverage of our reference corpus. To this end, we present GeoCloud. Key contributions of GeoCloud are 1) a training algorithm for learning reference-representations from delivery coordinates and 2) a retrieval algorithm for geocoding new addresses. We perform extensive testing of GeoCloud across India to capture the regional, socio-economical and linguistic diversity of our country. Our evaluation data is sampled from 72 cities and 21 states from the delivery addresses of a large e-commerce platform in India. The results show a significant improvement in precision and recall over the state-of-the-art geocoding system for India, and demonstrate the effectiveness of our intuitive, robust and generic approach. While we have shown the effectiveness of the framework for Indian addresses, we believe the framework can be applied to other countries as well, particularly where addresses are unstructured. To the best of our knowledge, this is the first instance of geocoding by learning reference-representations from large-scale delivery data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.