L. J. McGibbney scite author profile

Discovering and accessing geospatial data presents a significant challenge for the Earth sciences community as massive amounts of data are being produced on a daily basis. In this article, we report a smart web-based geospatial data discovery system that mines and utilizes data relevancy from metadata user behavior. Specifically, (1) the system enables semantic query expansion and suggestion to assist users in finding more relevant data; (2) machine-learned ranking is utilized to provide the optimal search ranking based on a number of identified ranking features that can reflect users' search preferences; (3) a hybrid recommendation module is designed to allow users to discover related data considering metadata attributes and user behavior; (4) an integrated graphic user interface design is developed to quickly and intuitively guide data consumers to the appropriate data resources. As a proof of concept, we focus on a well-defined domain-oceanography and use oceanographic data discovery as an example. Experiments and a search example show that the proposed system can improve the scientific community's data search experience by providing query expansion, suggestion, better search ranking, and data recommendation via a user-friendly interface.

show abstract

Regional Climate Model Evaluation System powered by Apache Open Climate Workbench v1.3.0: an enabling tool for facilitating regional climate studies

Lee

Goodman

McGibbney

et al. 2018

Geosci. Model Dev.

View full text Add to dashboard Cite

Abstract. The Regional Climate Model Evaluation System (RCMES) is an enabling tool of the National Aeronautics and Space Administration to support the United States National Climate Assessment. As a comprehensive system for evaluating climate models on regional and continental scales using observational datasets from a variety of sources, RCMES is designed to yield information on the performance of climate models and guide their improvement. Here, we present a user-oriented document describing the latest version of RCMES, its development process, and future plans for improvements. The main objective of RCMES is to facilitate the climate model evaluation process at regional scales. RCMES provides a framework for performing systematic evaluations of climate simulations, such as those from the Coordinated Regional Climate Downscaling Experiment (CORDEX), using in situ observations, as well as satellite and reanalysis data products. The main components of RCMES are (1) a database of observations widely used for climate model evaluation, (2) various data loaders to import climate models and observations on local file systems and Earth System Grid Federation (ESGF) nodes, (3) a versatile processor to subset and regrid the loaded datasets, (4) performance metrics designed to assess and quantify model skill, (5) plotting routines to visualize the performance metrics, (6) a toolkit for statistically downscaling climate model simulations, and (7) two installation packages to maximize convenience of users without Python skills. RCMES website is maintained up to date with a brief explanation of these components. Although there are other open-source software (OSS) toolkits that facilitate analysis and evaluation of climate models, there is a need for climate scientists to participate in the development and customization of OSS to study regional climate change. To establish infrastructure and to ensure software sustainability, development of RCMES is an open, publicly accessible process enabled by leveraging the Apache Software Foundation's OSS library, Apache Open Climate Workbench (OCW). The OCW software that powers RCMES includes a Python OSS library for common climate model evaluation tasks as well as a set of user-friendly interfaces for quickly configuring a model evaluation task. OCW also allows users to build their own climate data analysis tools, such as the statistical downscaling toolkit provided as a part of RCMES.

show abstract

A Cloud-Based Framework for Large-Scale Log Mining through Apache Spark and Elasticsearch

Jiang

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

The volume, variety, and velocity of different data, e.g., simulation data, observation data, and social media data, are growing ever faster, posing grand challenges for data discovery. An increasing trend in data discovery is to mine hidden relationships among users and metadata from the web usage logs to support the data discovery process. Web usage log mining is the process of reconstructing sessions from raw logs and finding interesting patterns or implicit linkages. The mining results play an important role in improving quality of search-related components, e.g., ranking, query suggestion, and recommendation. While researches were done in the data discovery domain, collecting and analyzing logs efficiently remains a challenge because (1) the volume of web usage logs continues to grow as long as users access the data; (2) the dynamic volume of logs requires on-demand computing resources for mining tasks; (3) the mining process is compute-intensive and time-intensive. To speed up the mining process, we propose a cloud-based log-mining framework using Apache Spark and Elasticsearch. In addition, a data partition paradigm, logPartitioner, is designed to solve the data imbalance problem in data parallelism. As a proof of concept, oceanographic data search and access logs are chosen to validate performance of the proposed parallel log-mining framework.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

L. J. McGibbney

Towards intelligent geospatial data discovery: a machine learning framework for search ranking

SciSpark: Applying in-memory distributed computing to weather event detection and tracking

A Smart Web-Based Geospatial Data Discovery System with Oceanographic Data as an Example

Regional Climate Model Evaluation System powered by Apache Open Climate Workbench v1.3.0: an enabling tool for facilitating regional climate studies

A Cloud-Based Framework for Large-Scale Log Mining through Apache Spark and Elasticsearch

Contact Info

Product

Resources

About