Newly developed high-throughput methods
for property predictions
make the process of materials design faster and more efficient. Density
is an important physical property for energetic compounds to assess
detonation velocity and detonation pressure, but the time cost of
recent density prediction models is still high owing to the time-consuming
processes to calculate molecular descriptors. To improve the screening
efficiency of potential energetic compounds, new methods for density
prediction with more accuracy and less time cost are urgently needed,
and a possible solution is to establish direct mappings between the
molecular structure and density. We propose three machine learning
(ML) models, support vector machine (SVM), random forest (RF), and
Graph neural network (GNN), using molecular topology as the only known
input. The widely applied quantitative structure–property relationship
based on the density functional theory (DFT–QSPR) is adopted
as the benchmark to evaluate the accuracies of the models. All these
four models are trained and tested by using the same data set enclosing
over 2000 reported nitro compounds searched out from the Cambridge
Structural Database. The proportions of compounds with prediction
error less than 5% are evaluated by using the independent test set,
and the values for the models of SVM, RF, DFT–QSPR, and GNN
are 48, 63, 85, and 88%, respectively. The results show that, for
the models of SVM and RF, fingerprint bit vectors alone are not facilitated
to obtain good QSPRs. Mapping between the molecular structure and
density can be well established by using GNN and molecular topology,
and its accuracy is slightly better than that of the time-consuming
DFT–QSPR method. The GNN-based model has higher accuracy and
lower computational resource cost than the widely accepted DFT–QSPR
model, so it is more suitable for high-throughput screening of energetic
compounds.
Elongated microvoids, internal fibrillar structure, and edge scattering from both surface refraction cause an equatorial streak in small angle X-ray scattering. A model for analyzing the edge scattering of fibers is proposed. Simulation results indicate that the intensity of edge scattering from surface refraction of a cylindrical fiber is strong and makes an important contribution to the equatorial streak. Two factors influence edge scattering intensity. One is the sample-to-detector distance (D); edge scattering intensity increases with increasing D. The equatorial streak becomes weak when D is shortened. The other factor is the refraction index. Edge scattering intensity increases as the real component of the refraction index decreases. In experiment, weak or even no equatorial streaks were found for samples measured in a roughly index-matching fluid. Edge scattering can be eliminated or weakened, and it can be calculated by comparing the intensities of a cylindrical fiber when it is measured in air and in index-matching fluid. The simulation data are basically in agreement with the experimental data.
Background: Building a large-scale medical knowledge graphs needs to automatically extract the relations between entities from electronic medical records(EMRs) . The main challenges are the scarcity of available labeled corpus and the identification of complexity semantic relations in text of Chinese EMRs. A hybrid method based on semi-supervised learning is proposed to extract the medical entity relations from small-scale complex Chinese EMRs. Methods: The semantic features of sentences are extracted by a residual network(ResNet) and the long dependent information is captured by bidirectional GRU(Gated Recurrent Unit). Then the attention mechanism is used to assign weights for the extracted features respectively, and the output of two attention mechanisms is integrated for relation prediction. We adjusted the training process with manually annotated small-scale relational corpus and bootstrapping semi-supervised learning algorithm, and continuously expanded the datasets during the training process. Results: We constructed a small corpus of Chinese EMRs relation extraction based on the EMR datasets released at the CCKS(China Conference on Knowledge Graph and Semantic Computing). The experimental results show that the best F1-score of the proposed method on the overall relation categories reaches 89.78%, which is 13.07% higher than the baseline CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.