Non-relational (NoSQL) databases have gained popularity in the recent years, especially in Web applications, where semi-structured data formats (e.g., JSON, XML) that used widely to store data on the Web are more suitable to be managed by NoSQL database management systems. Web mapping software (OpenLayers, Leaflet, MapServer, GeoServer, etc.) implement geospatial extensions of such data formats, for example GeoJSON that is a standardized JSON document type, which can be used to represent simple geographical features alongside with their non-spatial attributes. In such a context, different relational database management system (RDBMS) vendors implemented JSON support in their software to provide greater flexibility for used relational database models. In this paper, a processing performance of semi-structured geospatial data in different databases management systems (DBMS) is analyzed. The analysis is performed on the example of GeoJSON datatype for different NoSQL DBMSs categories (MongoDB, Cassandra, CouchDB and Neo4J), in parallel with analysis of the PostgreSQL which is RDBMS with JSON processing capabilities. The results are presented for GeoJSON writing latency, geospatial querying based on location with and without spatial indexing, and querying based on attributes alongside with querying based on location. The conclusions can be used to support content-based estimations of the demands to the DBMS and its restrictions at the database design stage. The results of the analysis show that in writing latency parameter MongoDB and CouchDB demonstrate the highest results. Additionally, the results demonstrated that organizing of the geoJSON data in a materialized view in PostgreSQL shows fastest results for both location querying and location combined with attributes querying, but it requires to use 23% more of storage size. Both MongoDB and Cassandra returned fast results without any additional disk space. Finally, when using geospatial index (supported only in MongoDB and PostgreSQL), PostgreSQL reading latency is reduced by a factor of 10% when querying geospatial location using the spatial indexes, while MongoDB shows no significant advantage of spatial index use.
The paper analyzes the use of social media data in geographical information systems to map the areas most affected by mortar shells in the capital of Syria, Damascus, by using geocoded and parsed social media data in geographical information systems. This paper describes a created algorithm to collecting and store data from social media sites. For the data store both a NoSQL database to save JSON format document and an RDBMS is used to save other spatial data types. A python script was written to collect the data in social media based on certain keywords related to the search. A geocoding algorithm to locate social media posts that normalize, standardize and tokenize the text was developed. The result of the developed diagram provided a year by year from 2013 to 2018 maps for mortar shell falling locations in Damascus. These layers give an overview for the changing of the numbers of mortar shells falls or in hot spot analysis for the city. Finally, social media data can prove to be useful when creating maps for dynamic social phenomena, for example, mortar shells’ location falling in Damascus, Syria. Moreover, social media data provide easy, massive, and timestamped data which makes these phenomena easier to study.
This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.