Artificial Intelligence (AI) has the promise of providing a paradigm shift in battery R&D by significantly accelerating the discovery and optimization of materials, interfaces, phenomena, and processes. However, the efficiency of any AI approach ultimately relies on rapid access to high‐quality and interpretable large datasets. Scientific publications contain a tremendous wealth of relevant data and these can possibly, but not certainly, be used to develop reliable AI algorithms useful for battery R&D. To address this, we present here a text mining study wherein we unravel lithium‐ion battery researchers’ habits when reporting results, reason on how these habits link to issues of lacking reproducibility and discuss the remaining challenges to be tackled in order to develop a more credible and impactful AI for battery R&D.
The lithium-ion battery (LIB) research literature has increased very rapidly of late. While this is an immense source of valuable knowledge and facts for the community, these are also partly "buried" in the literature. To truly make the most possible use of the information available and automate "reading", special tools are required. Named entity recognition (NER) is one such tool, which uses supervised machine learning for information extraction. To enable efficient NER, however, a large and highquality annotated corpus is crucial. Here, we report on our generated, semi-automatically annotated lithium-ion battery annotated corpus, "LIBAC", for 28 different entities of LIBs, which was used for training and evaluating Tok2vec and Transformer-based models, resulting in high general accuracies for these with F 1 -scores of 81 and 83%, respectively. LIBAC itself was created from 6985 paragraphs randomly chosen from ca. 11,000 LIB research papers and contains 73,300 annotated spans (627,428 tokens). This is the prime stepping-stone needed to develop a large-scale information extraction system designed for the LIB research literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.