ArcGIS Online is a unified Web portal designed by Environment System Research Institute (ESRI). It contains a rich collection of Web maps, layers, and services contributed by GIS users throughout the world. The metadata about these GIS resources reside in data silos that can be accessed via a Web API. While this is sufficient for simple syntax-based searches, it does not support more advanced queries, e.g., finding maps based on the semantics of the search terms, or performing customized queries that are not pre-designed in the API. In metadata, titles and descriptions are commonly available attributes which provide important information about the content of the GIS resources. However, such data cannot be easily used since they are in the form of unstructured natural language. To address these difficulties, we combine data-driven techniques with theory-driven approaches to enable semantic search and knowledge discovery for ArcGIS Online. We develop an ontology for ArcGIS Online data, convert the metadata into Linked Data, and enrich the metadata by extracting thematic concepts and geographic entities from titles and descriptions. Based on a human participant experiment, we calibrate a linear regression model for semantic search, and demonstrate the flexible queries for knowledge discovery that are not possible in the existing Web API. While this research is based on the ArcGIS Online data, the presented methods can also be applied to other GIS cloud services and data infrastructures.
Geoportals provide integrated access to geospatial resources, and enable both authorities and the general public to contribute and share data and services. An essential goal of geoportals is to facilitate the discovery of the available resources. Such process heavily relies on the quality of metadata. While multiple metadata standards have been established, data contributers may adopt different standards when sharing their data via the same geoportal. This is especially the case for user-generated content where various terms and topics can be introduced to describe similar datasets. While this heterogeneity provides a wealth of perspectives, it also complicates resource discovery. With the fast development of the Semantic Web technologies, there is a rise of Linked-Data-driven portals. Although these novel portals open up new ways to organizing metadata and retrieving resources, they lack effective semantic search methods. This paper addresses the two challenges discussed above, namely the topic heterogeneity brought by multiple metadata standards as well as the lack of established semantic search in Linked-Data-driven geoportals. To harmonize the metadata topics, we employ a natural language processing method, namely Labeled Latent Dirichlet Allocation (LLDA), and train it using standardized metadata from Data.gov. With respect to semantic search, we construct thematic and geographic matching features from the textual metadata descriptions, and train a regression model via a human participants experiment. We evaluate our methods by examining their performances in addressing the two issues. Finally, we implement a semantics-enabled and Linked-Data-driven prototypical geoportal using a sample dataset from Esri's ArcGIS Online.
Place name disambiguation is an important task for improving the accuracy of geographic information retrieval. This task becomes more challenging when the input texts are short. Wikipedia provides information about places and has often been employed for named entity recognition. However, the natural language representation of Wikipedia articles limits more effective use of this rich knowledge base. DBpedia is the Semantic Web version of Wikipedia, which provides structured and machine-understandable knowledge mined from Wikipedia articles. This paper presents an approach for combining Wikipedia and DBpedia to disambiguate place names in short texts. We discuss the pros and cons of the two knowledge bases, and argue that a combination of both performs better than each of them alone. We evaluate our proposed method by conducting experiments against baselines of three established methods. The result indicates that our method has a generally higher precision and recall. While our study employs DBpedia, the proposed method is generic and can be extended to other structured Linked Datasets such as Freebase or Wikidata.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.