Data Consistency Theory and Case Study for Scientific Big Data

Shi, Peng; Cui, Yulin; Xu, Kangming; Zhang, Mingmei; Ding, Lianhong

doi:10.3390/info10040137

Cited by 22 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has to be stressed that for the data export, the operating system used (Windows or Mac) was taken into account, through appropriate selections provided by the VOSviewer interface. An important task of the refinements relates to the consistency of the records in the databases [49] obtained in the previous stage, as part of the minimum data cleaning procedures (data cleaning Rahm and Do [50]). Because of the fact that the database was a simple one, with a single table, we noticed that this consistency refered, in principle, to the completion of all table attributes with complete and correct information.…”

Section: Refining By Filtering the Initial Resultsmentioning

confidence: 99%

Societal Technological Megatrends: A Bibliometric Analysis from 1982 to 2021

et al. 2022

View full text Add to dashboard Cite

This article presents a bibliometric analysis of scientific publications investigating technological megatrends at the societal level, through the parallel analysis of 549 documents from Scopus and 291 documents in Web of Science (WoS) using the VOSviewer software and the Excel component of the MS Office 365 package. The main purpose of this study was to obtain an overview of the evolution of the research on the subject of technological megatrends from the perspective of interest, domains, geographical areas, sources, authors and cocitation networks, research clusters of countries, and cluster-related concepts. The results showed that publications on technological megatrends started in 1982, but from a scientific point of view they started in 1983 (Scopus) and 1984 (WoS), and that they display an increasing trend after 2010. Technological Forecasting and Social Change, Nature, SAE Technical Papers, VDI Berichte, Harvard Business Review, Advances in Intelligent Systems and Computing, and Sustainability represent the most important sources, and Gibbs, Kraemer, Dedrick, Kim, Chmiela, Sauceda, Müller, Tkatchenko, Pratt, Sarmiento, Montes, Ogilvie, Marcus, Perez, Brownson, D. Mourtzis, M. Doukas, and Bernidaki are the most notorious researchers in the field. At the societal level, technological megatrends are closely related to foresight, globalization, industry 4.0, the internet of things, digitalization, technology, artificial intelligence, innovation, the future, and sustainability. This study is original and useful for researchers in the context of the lack of similar studies on this subject.

show abstract

Section: Refining By Filtering the Initial Resultsmentioning

confidence: 99%

Societal Technological Megatrends: A Bibliometric Analysis from 1982 to 2021

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Therefore, more complete quality control steps and automatic DQ checking rules should be embedded throughout the entire scientific data manufacturing process. Taking the stage of data processing, for example, we can integrate scientific data from different sources with the help of metadata and standard terminology which could benefit the data consistency and interdisciplinary data sharing (Pasquetto et al , 2019; Shi et al , 2019). Therefore, for quality assurance, it is necessary for researchers to establish a complete DMP to assure scientific DQ before constructing an IP-Map.…”

Section: Strategies For Improving Scientific Data Qualitymentioning

confidence: 99%

Process-driven quality improvement for scientific data based on information product map

Zong

Song-tao

Gao

et al. 2022

View full text Add to dashboard Cite

Purpose This paper aims to provide a process-driven scientific data quality (DQ) monitoring framework by information product map (IP-Map) in identifying the root causes of poor DQ issues so as to assure the quality of scientific data. Design/methodology/approach First, a general scientific data life cycle model is constructed based on eight classical models and 37 researchers’ experience. Then, the IP-Map is constructed to visualize the scientific data manufacturing process. After that, the potential deficiencies that may arise and DQ issues are examined from the aspects of process and data stakeholders. Finally, the corresponding strategies for improving scientific DQ are put forward. Findings The scientific data manufacturing process and data stakeholders’ responsibilities could be clearly visualized by the IP-Map. The proposed process-driven framework is helpful in clarifying the root causes of DQ vulnerabilities in scientific data. Research limitations/implications As for the implications for researchers, the process-driven framework proposed in this paper provides a better understanding of scientific DQ issues during implementing a research project as well as providing a useful method to analyse those DQ issues based on IP-Map approach from the aspects of process and data stakeholders. Practical implications The process-driven framework is beneficial for the research institutions, scientific data management centres and researchers to better manage the scientific data manufacturing process and solve the scientific DQ issues. Originality/value This research proposes a general scientific data life cycle model and further provides a process-driven scientific DQ monitoring framework for identifying the root causes of poor data issues from the aspects of process and stakeholders which have been ignored by existing information technology-driven solutions. This study is likely to lead to an improved approach to assuring the scientific DQ and is applicable in different research fields.

show abstract

“…Consistency of data refers to the concept that the same data stored in separate places or separate time points still match, meaning contradictory conlusions cannot be derived from the given data. 48 For example, can be ensured that archived/backedup/repository-deposited information can be kept up to date?…”

Section: Can Meta-information Offer Sufficient Contextual Information...mentioning

confidence: 99%

Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review

Sedláková

Daniore

Wintsch

et al. 2022

Preprint

View full text Add to dashboard Cite

Digital data play an increasingly important role in advancing medical research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Specifically, unstructured data are available in a non-standardized format and require substantial preprocessing and feature extraction to translate them to meaningful insights. This might hinder their potential to advance health research, prevention, and patient care delivery, as these processes are resource intensive and connected with unresolved challenges. These challenges might prevent enrichment of structured evidence bases with relevant unstructured data, which we refer to as digital unstructured data enrichment. While prevalent challenges associated with unstructured data in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with existing data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health along with possible solutions to address these challenges. Building on these findings, we compiled a checklist following the standard data flow in a research study to contribute to the limited available systematic guidance on digital unstructured data enrichment. This proposed checklist offers support in early planning and feasibility assessments for health research combining unstructured data with existing data sources. Finally, the sparsity and heterogeneity of unstructured data enrichment methods in our review call for a more systematic reporting of such methods to achieve greater reproducibility.

show abstract

Data Consistency Theory and Case Study for Scientific Big Data

Cited by 22 publications

References 26 publications

Societal Technological Megatrends: A Bibliometric Analysis from 1982 to 2021

Societal Technological Megatrends: A Bibliometric Analysis from 1982 to 2021

Process-driven quality improvement for scientific data based on information product map

Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review

Contact Info

Product

Resources

About