Abstract. Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing.
Much of the information stored on the web contains geographical context, but current search engines treat such context in the same way as all other content. In this paper the design, implementation and evaluation of a spatially-aware search engine are described which is capable of handling queries in the form of the triplet of . The process of identifying geographic references in documents and assigning appropriate footprints to documents, to be stored together with document terms in an appropriate indexing structure allowing real-time search is described. Methods allowing users to query and explore results which have been relevance ranked in terms of both thematic and spatial relevance have been implanted and a usability study indicates that users are happy with the range of spatial relationships available and intuitively understand how to use such a search engine. Normalised precision for 38 queries, containing four types of spatial relationships is significantly higher (p < 0.001) for search exploiting spatial information than pure text search.
Abstract. The SPIRIT search engine provides a test bed for the development of web search technology that is specialised for access to geographical information. Major components include the user interface, geographical ontology, maintenance and retrieval functions for a test collection of web documents, textual and spatial indexes, relevance ranking and metadata extraction. Here we summarise the functionality and interaction between these components before focusing on the design of the geo-ontology and the development of spatio-textual indexing methods. The geo-ontology supports functionality for disambiguation, query expansion, relevance ranking and metadata extraction. Geographical place names are accompanied by multiple geometric footprints and qualitative spatial relationships. Spatial indexing of documents has been integrated with text indexing through the use of spatio-textual keys in which terms are concatenated with spatial cells to which they relate. Preliminary experiments demonstrate considerable performance benefits when compared with pure text indexing and with text indexing followed by a spatial filtering stage.
This paper describes several steps in the derivation of boundaries of imprecise regions using the Web as the information source. We discuss how to obtain locations that are part of and locations that are not part of the region to be delineated, and then we propose methods to compute the region algorithmically. The methods introduced are evaluated to judge the potential of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.