Recognizing spatial language in text documents, termed
geoparsing
, is useful for many applications, because together with mapping such language to lat/long values, also known as
geocoding
, it enables the connection of the unstructured textual realm with the structured realm of
Geographic Information Systems (GIS)
[11]. For example, news stories about events happening in a particular location can be explored on a map for a spatial understanding of these events, as implemented by applications like the
European Media Monitor (EMM)
[18] and
NewsStand
[13, 20]. Web pages, blogs, encyclopedia articles, news stories, tweets and travel reports can all benefit from such interlinking with maps, which requires the recognition of spatial language. Note that geoparsing can be considered as a more specific application of the task of
Named Entity Recognition and Classification (NERC):
NERC is concerned with automatically recognizing proper nouns of any kind, often meant to include monetary amounts, dates, and other types, while geoparsing is the NERC task applied to locations specifically. Geoparsing is also known by many names in the literature, including
geotagging, georecognition
, and
toponym recognition
, but for consistency, here we will refer only to geoparsing. In this paper, we provide an overview of the challenges related to geoparsing, several families of geoparsing methods, existing systems and data collections available for performing geoparsing, and open research questions related to geoparsing.