OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.
Big Data emerged after a big explosion of data from the Web 2.0, digital sensors, and social media applications such as Facebook, Twitter, etc. In this constant growth of data, many domains are influenced, especially the decisional support system domain, where the integration of processes should be adapted to support this huge amount of data to improve analysis goals. The basic purpose of this research article is to adapt extract-transform-load processes with Big Data technologies, in order to support not only this evolution of data but also the knowledge discovery. In this article, a new approach called Big Dimensional ETL (BigDimETL) is suggested to deal with ETL basic operations and take into account the multidimensional structure. In order to accelerate data handling, the MapReduce paradigm is used to enhance data warehousing capabilities and HBase as a distributed storage mechanism. Experimental results confirm that the ETL operation performs well especially with adapted operations.
As the amount of information exceeds the management and storage capacity of traditional data management systems, several domains need to take into account this growth of data, in particular the decision-making domain known as Business Intelligence (BI). Since the accumulation and reuse of these massive data stands for a gold mine for businesses, several insights that are useful and essential for effective decision making have to be provided. However, it is obvious that there are several problems and challenges for the BI systems, especially at the level of the ETL (Extraction-Transformation-Loading) as an integration system. These processes are responsible for the selection, filtering and restructuring of data sources in order to obtain relevant decisions. In this research paper, our central focus is especially upon the adaptation of the extraction phase inspired from the first step of MapReduce paradigm in order to prepare the massive data to the transformation phase. Subsequently, we provide a conceptual model of the extraction phase which is composed of a conversion operation that guarantees obtaining NoSQL structure suitable for Big Data storage, and a vertical partitioning operation for presenting the storage mode before submitting data to the second ETL phase. Finally, we implement through Talend for Big Data our new component which helps the designer extract data from semi-structured data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.