Nowadays, with the incredible demographic explosion that we have witnessed in the last few decades, management of transport is of paramount importance. The reason for this is that we have to face the management of problems relating to traffic detection, traffic jams created by urban public transport, data on motorway tolls, meteorological data and traffic safety, etc. These types of traffic data are numerous and enormous. Traditional tools are now unable to solve these problems. With the rapid development of Big Data technologies, the new way of thinking about intelligent transport has become an obligation; as a result, new architectures are mainly needed to work with big data. . In order to overcome this problem, it is essential to create a Big Data modeling approach for ITS, which pays particular attention to the creation of multiple layers. Among these we find Management and Processing layer which in its turn contains three levels: processing, analyzing and storing. In this paper, we are interested in the processing level, which attracts the attention of researchers. In fact, we will propose a Big Data processing design applied to Intelligent Transportation Systems. We will adopt a data modeling approach that treats both the transmission and the processing data.
The creation of smart cities aims to reduce the problems posed by the continuous evolution of population density and urbanization. Smart City applications produce a huge amount of data every day. Thus, the knowledge of these large data in a context of urban and intelligent decision-making has become an issue for current systems. Large data analysis frameworks offer significant inventive potential in the new area of the smart community. This paper propose a new architecture for large data analysis for smart cities called "BIG DATA ANALYTICS FRAMEWORK FOR SMART CITY (BDAFSC)", The proposed framework specifically addresses a conceptual and technological model by creating several layers of abstraction. The proposed architecture is generic and can be applied to a wide range of smart city use cases.
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually limited to few metrics. Indeed, while more than 50 data quality dimensions have been defined in the literature, the number of measured dimensions is limited to 11 dimensions. Therefore, this paper aims to extend the measured dimensions by defining four new data quality metrics: Integrity, Accessibility, Ease of manipulation, and Security. Thus, we propose a comprehensive Big Data Quality Assessment Framework based on 12 metrics: Completeness, Timeliness, Volatility, Uniqueness, Conformity, Consistency, Ease of manipulation, Relevancy, Readability, Security, Accessibility, and Integrity. In addition, to ensure accurate data quality assessment, we apply data weights at three data unit levels: data fields, quality metrics, and quality aspects. Furthermore, we define and measure five quality aspects to provide a macro-view of data quality. Finally, an experiment is performed to implement the defined measures. The results show that the suggested methodology allows a more exhaustive and accurate big data quality assessment, with a more extensive methodology defining a weighted quality score based on 12 metrics and achieving a best quality model score of 9/10.
While big data benefits are numerous, most of the collected data is of poor quality and, therefore, cannot be effectively used as it is. One pre-processing the leading big data quality challenges is data duplication. Indeed, the gathered big data are usually messy and may contain duplicated records. The process of detecting and eliminating duplicated records is known as Deduplication, or Entity Resolution or also Record Linkage. Data deduplication has been widely discussed in the literature, and multiple deduplication approaches were suggested. However, few efforts have been made to address deduplication issues in Big Data Context. Also, the existing big data deduplication approaches are not handling the case of the decreasing performance of the deduplication model during the serving. In addition, most current methods are limited to duplicate detection, which is part of the deduplication process. Therefore, we aim through this paper to propose an End-to-End Big Data Deduplication Framework based on a semi-supervised learning approach that outperforms the existing big data deduplication approaches with an F-score of 98,21%, a Precision of 98,24% and a Recall of 96,48%. Moreover, the suggested framework encompasses all data deduplication phases, including data preprocessing and preparation, automated data labeling, duplicate detection, data cleaning, and an auditing and monitoring phase. This last phase is based on an online continual learning strategy for big data deduplication that allows addressing the decreasing performance of the deduplication model during the serving. The obtained results have shown that the suggested continual learning strategy has increased the model accuracy by 1,16%. Furthermore, we apply the proposed framework to three different datasets and compare its performance against the existing deduplication models. Finally, the results are discussed, conclusions are made, and future work directions are highlighted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.