Big Earth data are produced from satellite observations, Internet-of-Things, model simulations, and other sources. The data embed unprecedented insights and spatiotemporal stamps of relevant Earth phenomena for improving our understanding, responding, and addressing challenges of Earth sciences and applications. In the past years, new technologies (such as cloud computing, big data and artificial intelligence) have gained momentum in addressing the challenges of using big Earth data for scientific studies and geospatial applications historically intractable. This paper reviews the big Earth data analytics from several aspects to capture the latest advancements in this fast-growing domain. We first introduce the concepts of big Earth data. The architecture, various functionalities, and supporting modules are then reviewed from a generic methodology aspect. Analytical methods supporting the functionalities are surveyed and analyzed in the context of different tools. The driven questions are exemplified through cutting-edge Earth science researches and applications. A list of challenges and opportunities are proposed for different stakeholders to collaboratively advance big Earth data analytics in the near future.
This paper presents two complementary methods: an approach to compute a network data-set for indoor space of a building by using its two-dimensional (2D) floor plans and limited semantic information, combined with an optimal crowd evacuation method. The approach includes three steps: (1) generate critical points in the space, (2) connect neighbour points to build up the network, and then (3) run the optimal algorithm for optimal crowd evacuation from a room to the exit gates of the building. Triangulated Irregular Network (TIN) is used in the first two steps. The optimal evacuation crowd is not based on the nearest evacuation gate for a person but relies on optimal sorting of the waiting lists at each gate of the room to be evacuated. As an example case, a rectangular room with 52 persons with two gates is evacuated in 102 elementary interval times (one interval corresponds to the time for one step for normal velocity walking), whereas it would have been evacuated in not less than 167 elementary steps. The procedure for generating the customized network involves the use of 2D floor plans of a building and some common Geographic Information System (GIS) functions. This method combined with the optimal sorting lists will be helpful for guiding crowd evacuation during any emergency.
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time-and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.
Big data emerged as a new paradigm to provide unprecedented content and value for Digital Earth. Big Earth data are increasing tremendously with growing heterogeneity, posing grand challenges for the data management lifecycle of storage, processing, analytics, visualization, sharing, and applications. During the same time frame, cloud computing emerged to provide crucial computing support to address these challenges. This chapter introduces Digital Earth data sources, analytical methods, and architecture for data analysis and describes how cloud computing supports big data processing in the context of Digital Earth. Keywords Geoscience • Spatial data infrastructure • Digital transformation • Big data architecture 9.1 Introduction Digital Earth refers to the virtual representation of the Earth we live in. It represents the Earth in the digital world from data to model. Data are collected and models are abstracted to build the digital reality. Massive amounts of data are generated from various sensors deployed to observe our home planet while building Digital Earth. The term "big data" was first presented by NASA researchers to describe the massive amount of information that exceeds the capacities of main memory, local disk, and even remote disk (Friedman 2012). According to the National Institute of Standards and Technology (NIST), "Big Data is a term used to describe the large amount of data in the networked, digitized, sensor-laden, information-driven world" (Chang and Grady 2015). This definition refers to the bounty of digital data from various data sources in the context of Digital Earth, which focus on big data's geographical aspects of social information, Earth observation (EO), sensor observation service (SOS), cyber infrastructure (CI), social media and business information (Guo 2017; Guo et al. 2017; Yang et al. 2017a, b). Digital Earth data are collected from satellites,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.