Complementing information about particular points, places, or institutions, i.e., so-called Points of Interest (POIs) can be achieved by matching data from the growing number of geospatial databases; these include Foursquare, OpenStreetMap, Yelp, and Facebook Places. Doing this potentially allows for the acquisition of more accurate and more complete information about POIs than would be possible by merely extracting the information from each of the systems alone. Problem: The task of Points of Interest matching, and the development of an algorithm to perform this automatically, are quite challenging problems due to the prevalence of different data structures, data incompleteness, conflicting information, naming differences, data inaccuracy, and cultural and language differences; in short, the difficulties experienced in the process of obtaining (complementary) information about the POI from different sources are due, in part, to the lack of standardization among Points of Interest descriptions; a further difficulty stems from the vast and rapidly growing amount of data to be assessed on each occasion. Research design and contributions: To propose an efficient algorithm for automatic Points of Interest matching, we: (1) analyzed available data sources—their structures, models, attributes, number of objects, the quality of data (number of missing attributes), etc.—and defined a unified POI model; (2) prepared a fairly large experimental dataset consisting of 50,000 matching and 50,000 non-matching points, taken from different geographical, cultural, and language areas; (3) comprehensively reviewed metrics that can be used for assessing the similarity between Points of Interest; (4) proposed and verified different strategies for dealing with missing or incomplete attributes; (5) reviewed and analyzed six different classifiers for Points of Interest matching, conducting experiments and follow-up comparisons to determine the most effective combination of similarity metric, strategy for dealing with missing data, and POIs matching classifier; and (6) presented an algorithm for automatic Points of Interest matching, detailing its accuracy and carrying out a complexity analysis. Results and conclusions: The main results of the research are: (1) comprehensive experimental verification and numerical comparisons of the crucial Points of Interest matching components (similarity metrics, approaches for dealing with missing data, and classifiers), indicating that the best Points of Interest matching classifier is a combination of random forest algorithm coupled with marking of missing data and mixing different similarity metrics for different POI attributes; and (2) an efficient greedy algorithm for automatic POI matching. At a cost of just 3.5% in terms of accuracy, it allows for reducing POI matching time complexity by two orders of magnitude in comparison to the exact algorithm.
JavaScript Object Notation was originally designed to transfer data; however, it soon found another use as a way of persisting data in NoSQL databases. Recently, the most-popular relational databases have introduced JSON as a native column type, which makes it easier to store and query dynamic database schema. In this paper, we review the currently popular techniques of storing data with a dynamic model with a large number of relationships between entities in relational databases. We focus on creating a simple dynamic schema with JSON in the most-popular relational databases, and we compare it with the well-known EAV data model and the document database. The results of precisely selected tests in the field of criminal data suggest that the use of JSON in dynamic database schema greatly simplifies queries and reduces their execution time compared to the widely used approaches.
The aim of this research is to build an open schema model for a digital sources repository in a relational database. This required us to develop a few advanced techniques. One of them was to keep and maintain a hierarchical data structure pushed into the repository. A second was to create constraints on any hierarchical level that allows for the enforcement of data integrity and consistency. The created solution is mainly based on a JSON file as a native column type, which was designed for holding open schema documents. In this paper, we present a model for any repository that uses hierarchical dynamic data. Additionally, we include a structure for normalizing the input and description for the data in order to keep all of the model assumptions. We compared our solution with a well-known open schema model-Entity-Attribute-Value-in the scope of saving data and querying about relationships and contents from the structure. The results show that we achieved improvements in both the performance and disk space usage, as we extended our model with a few new features that the previous model does not include. The techniques developed in this research can be applied in every domain where hierarchical dynamic data is required, as demonstrated by the digital book repository that we have presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.