The ability to auto-generate databases of optical properties holds great prospects in data-driven materials discovery for optoelectronic applications. We present a cognate set of experimental and computational data that describes key features of optical absorption spectra. This includes an auto-generated database of 18,309 records of experimentally determined UV/vis absorption maxima, λmax, and associated extinction coefficients, ϵ, where present. This database was produced using the text-mining toolkit, ChemDataExtractor, on 402,034 scientific documents. High-throughput electronic-structure calculations using fast (simplified Tamm-Dancoff approach) and traditional (time-dependent) density functional theory were executed to predict λmax and oscillation strengths, f (related to ϵ) for a subset of validated compounds. Paired quantities of these computational and experimental data show strong correlations in λmax, f and ϵ, laying the path for reliable in silico calculations of additional optical properties. The total dataset of 8,488 unique compounds and a subset of 5,380 compounds with experimental and computational data, are available in MongoDB, CSV and JSON formats. These can be queried using Python, R, Java, and MATLAB, for data-driven optoelectronic materials discovery.
We present and assess the UK Biobank (UKB) Polygenic Risk Score (PRS) Release, a set of PRSs for 28 diseases and 25 quantitative traits being made available on the individuals in UKB. We also release a benchmarking software tool to enable like-for-like performance evaluation for different PRSs for the same disease or trait. Extensive benchmarking shows the PRSs in the UKB Release to outperform a broad set of 81 published PRSs. For many of the diseases and traits we also validate the PRS algorithms in other cohorts. The availability of PRSs for 53 traits on the same set of individuals also allows a systematic assessment of their properties, and the increased power of these PRSs increases the evidence for their potential clinical benefit.
Data‐driven materials discovery has become increasingly important in identifying materials that exhibit specific, desirable properties from a vast chemical search space. Synergic prediction and experimental validation are needed to accelerate scientific advances related to critical societal applications. A design‐to‐device study that uses high‐throughput screens with algorithmic encodings of structure–property relationships is reported to identify new materials with panchromatic optical absorption, whose photovoltaic device applications are then experimentally verified. The data‐mining methods source 9431 dye candidates, which are auto‐generated from the literature using a custom text‐mining tool. These candidates are sifted via a data‐mining workflow that is tailored to identify optimal combinations of organic dyes that have complementary optical absorption properties such that they can harvest all available sunlight when acting as co‐sensitizers for dye‐sensitized solar cells (DSSCs). Six promising dye combinations are shortlisted for device testing, whereupon one dye combination yields co‐sensitized DSSCs with power conversion efficiencies comparable to those of the high‐performance, organometallic dye, N719. These results demonstrate how data‐driven molecular engineering can accelerate materials discovery for panchromatic photovoltaic or other applications.
The rise of data science is leading to new paradigms in data-driven materials discovery. This carries an essential notion that large data sources containing chemical structure and property information can be mined in a fashion that detects and exploits structure–property relationships, such that chemicals can be predicted to suit a given material application. The success of material predictions is predicated on these large data sources of chemical structure and property information being suited to a target application. Microscopy is commonly used to characterize chemical structure, especially in fields such as nanotechnology where material properties are highly dependent on the size and shape of nanoparticles. Large data sources of nanoparticle information stemming from microscopy images would thus be highly beneficial. Millions of microscopy images exist, but they lie fragmented across the literature, typically presented individually within a paper article and usually in a qualitative fashion therein, even though they harbor a wealth of numeric information. We present the ImageDataExtractor toolkit that autoidentifies and autoextracts microscopy images from scientific documents, whereupon it autonomously analyzes each image to produce quantitative particle size and shape information about its subject material. Each image is quantified by decoding its scale bar information using optical character recognition, with help from super-resolution convolutional neural networks where required. Individual particles are detected and profiled using various thresholding, segmentation, polygon fitting, and edge correction routines. The high-throughput operational capability of ImageDataExtractor means that it can be used to generate large-data sources of particle information for data-driven materials discovery. Evaluation metrics, precision and recall, are greater than 80% for the majority of the image processing steps, and precision is above 80% for all critical steps. The ImageDataExtractor tool is released under the MIT license and is available to download from .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.