In this paper, we introduce an approach for constructing a geographical taxonomy of adjacency for a country, to be used in reformulating spatial queries. The proposed approach uses the best-ranked documents retrieved by the search engine while submitting the spatial entity composed of a spatial relationship and a noun of a city A. Then, apply to it the Latent Semantic Indexing method to found the nearest cities B i to A, and proceed to a step of validation of each link by verifying if A is also found in the results of the cities B i . In our experiments, we constructed a geographical taxonomy of adjacency for Morocco. We varied the spatial relationship used in the step of documents retrieving to compare the results of the different spatial relationships, and we used google web services as a search engine to compare the results returned in every case. Then we used the constructed taxonomy in geographical query reformulation. We have used the Un-interpolated Average Precision (UAP) to compare the returned documents before and after reformulation. According to our results, we note that reformulating geographical queries based on our built taxonomy improves widely the precision of the queries.
Geographical queries need a special process of reformulation by information retrieval systems (IRS) due to their specificities and hierarchical structure. This fact is ignored by most of web search engines. In this paper, we propose an automatic approach for building a spatial taxonomy, that models’ the notion of adjacency that will be used in the reformulation of the spatial part of a geographical query. This approach exploits the documents that are in top of the retrieved list when submitting a spatial entity, which is composed of a spatial relation and a noun of a city. Then, a transactional database is constructed, considering each document extracted as a transaction that contains the nouns of the cities sharing the country of the submitted query’s city. The algorithm frequent pattern growth (FP-growth) is applied to this database in his parallel version (parallel FP-growth: PFP) in order to generate association rules, that will form the country’s taxonomy in a Big Data context. Experiments has been conducted on Spark and their results show that query reformulation using the taxonomy constructed based on our proposed approach improves the precision and the effectiveness of the IRS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.