In recent years, the radical advancement of technologies has given rise to an abundance of software applications, social media, and smart devices such as smartphone, sensors, and so on. More extensive use of these applications and tools in various industrial domains has led to data deluge, which has fostered enormous challenges and opportunities. However, it is not only the volume of the data but also the speed, variety, and uncertainty, which are promoting a massive challenge for traditional technologies such as data warehouse. These diverse and unprecedented characteristics have engendered the notion of ''Big Data.'' The data-intensive industries have been experiencing a wide variety of challenges in terms of processing, managing, and analysis of data. For instance, the healthcare sector is confronting difficulties in respect of integration or fusion of diverse medical data stemming from multiple heterogeneous sources. Data integration is critically important within the healthcare sector because it enriches data, enhances its value, and more importantly paves a solid foundation for highly efficient and effective healthcare analytics such as predicting diseases or an outbreak. Several data integration technologies and tools have been developed over the last two decades. This paper aims at studying data integration technologies, tools, and applications within the healthcare domain. Furthermore, this paper discusses future research directions in the integration of Big healthcare data. INDEX TERMS Big data, data integration, healthcare data.
With the rapid growth of collected data and the variety of its content, the need for efficient integration at a Big Data level becomes crucial. Semantic technologies, as a means of integration and coordination of heterogeneous systems, may help big data to manage terminology and relationships to link various data from different data sources. However, and due to the difficulty of integration and analytics of some datasets with high-precision, automated processes cannot reach a high level of accuracy without the human cognitive ability. Crowdsourcing platforms have the potential to integrate (entity matching, entity resolution) and analyze (sentiment analysis, image recognition) heterogeneous data sources when in some cases these integration tasks may prove to be problematic for computers. In this survey, we explore and compare empirical research studies that rely on merging semantic and crowdsourcing technologies. And, in the light of this comparison, we propose a high-level integration workflow, which shows how merging these technologies can enhance the big data integration process and tackle the data analysis challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.