An examination of data reuse practices within highly cited articles of faculty at a research university

Imker, H.J.; Luong, Hoa; Mischo, William H.; Schlembach, Mary C.; Wiley, Chris

doi:10.1016/j.acalib.2021.102369

Cited by 16 publications

(11 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, most supporting information for dataset release (63.2%, 55/87) would cause trouble, as pointed out in other studies (Imker et al, 2021; Jiao & Li, 2022). Supporting information is only available on the electronic journal site and require an access contract.…”

Section: Discussionmentioning

confidence: 75%

Initial insight into three modes of data sharing: Prevalence of primary reuse, data integration and dataset release in research articles

Sakai,

Miyata,

Yokoi

et al. 2023

Learned Publishing

View full text Add to dashboard Cite

While data sharing has received research interest in recent times, its real status remains unclear, owing to its ambiguous concept. To understand the current status of data sharing, this study examined primary reuse, data integration, and dataset release as the actual practices of data sharing. A total of 963 articles, chosen from those published in 2018 and registered in the Web of Science global citation database, were manually checked. Existing data were reused in the mode of data integration (13.3%) as frequently as they were for the mode of primary reuse (12.1%).Dataset release was the least common mode (9.0%). The results show the variation in data sharing and indicate the need for standardization of data description in articles based on thorough registration and expansion in public data archives to close the loop that results in the virtuous cycle of research data.

show abstract

Section: Discussionmentioning

confidence: 75%

Initial insight into three modes of data sharing: Prevalence of primary reuse, data integration and dataset release in research articles

Sakai,

Miyata,

Yokoi

et al. 2023

Learned Publishing

View full text Add to dashboard Cite

show abstract

“…However, the solution of this problem requires detailed theoretical calculations, high-quality preliminary processing of the obtained data of physical experiments, their convenient presentation for the purpose of searching for the physical dependencies that underlie them, reuse of the collected data, and their reproducibility. The importance of reusing these experiments is excellently demonstrated in this study [11].…”

Section: Problem Formulationmentioning

confidence: 75%

XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

et al. 2021

View full text Add to dashboard Cite

Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse.

show abstract

“…In the present study, the physiological data and some of the presence points of ACT were sourced from the scientific literature. This encourages data reuse for future research, allowing future research work to progress more efficiently and effectively 72 , 73 . Unlike the mechanistic models, the CLIMEX model uses the occurrence records of a species and its physiological stress factors to establish the potential distribution of species.…”

Section: Discussionmentioning

confidence: 99%

Predicting the potential global distribution of an invasive alien pest Trioza erytreae (Del Guercio) (Hemiptera: Triozidae)

Aidoo¹,

Souza

Silva

et al. 2022

Sci Rep

View full text Add to dashboard Cite

The impact of invasive alien pests on agriculture, food security, and biodiversity conservation has been worsened by climate change caused by the rising earth’s atmospheric greenhouse gases. The African citrus triozid, Trioza erytreae (Del Guercio; Hemiptera: Triozidae), is an invasive pest of all citrus species. It vectors the phloem-limited bacterium “Candidatus Liberibacter africanus”, a causal agent of citrus greening disease or African Huanglongbing (HLB). Understanding the global distribution of T. erytreae is critical for surveillance, monitoring, and eradication programs. Therefore, we combined geospatial and physiological data of T. erytreae to predict its global distribution using the CLIMEX model. The model’s prediction matches T. erytreae present-day distribution and shows that parts of the Mediterranean region have moderate (0 < EI < 30) to high (EI > 30) suitability for the pest. The model predicts habitat suitability in the major citrus-producing countries, such as Mexico, Brazil, China, India, and the USA. In the Special Report on Emissions Scenarios (SRES) A1B and A2 scenarios, the model predicts a reduction in habitat suitability from the current time to 2070. The findings show that global citrus production will continue to be threatened by T. erytreae. However, our study provides relevant information for biosecurity and risk assessment.

show abstract

An examination of data reuse practices within highly cited articles of faculty at a research university

Cited by 16 publications

References 35 publications

Initial insight into three modes of data sharing: Prevalence of primary reuse, data integration and dataset release in research articles

Initial insight into three modes of data sharing: Prevalence of primary reuse, data integration and dataset release in research articles

XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

Predicting the potential global distribution of an invasive alien pest Trioza erytreae (Del Guercio) (Hemiptera: Triozidae)

Contact Info

Product

Resources

About