Traditional Chinese
medicines (TCMs) have wide pharmacological
activities, and the ingredients in individual TCMs determine their
efficacies. To understand the “efficacy–nature–structure”
relationship of TCM, compounds from 2444 kinds of herbs were collected,
and the associations between family, structure, nature, and biological
activities were mined and analyzed. Bernoulli Naïve Bayes profiling
and a data analysis method were used to predict the targets of compounds.
The results show that genetic material determined the representation
of ingredients from herbs and the nature of TCMs and that the superior
scaffolds of compounds of cold nature were 2-phenylochrotinone, anthraquinone,
and coumarin, while the compounds of hot nature were cyclohexene.
The results of the similarity analysis and distribution for molecular
descriptors of compounds show that compounds associated with the same
nature were similar and compounds associated with different natures
occurred as a transition in part. As for integral compounds from 2-phenylochrotinone,
anthraquinone, coumarin, and cyclohexene, the value of the shape index
increased, indicating the transition of scaffolds from a spherical
structure to a linear structure, with various molecular descriptors
decreasing. Three medicines and three recipes prescribed based on
“efficacy–nature–structure” had a higher
survival rate in the clinic and provided powerful evidence for TCM
principles. The research improves the understanding of the “efficacy–nature–structure”
relationship and extends TCM applications.
Machine learning techniques are essential for system log anomaly detection. It is prone to the phenomenon of class overlap because of too many similar system log data. The occurrence of this phenomenon will have a serious impact on the anomaly detection of the system logs. To solve the problem of class overlap in system logs, this paper proposes an anomaly detection model for class overlap problem on system logs. We first calculate the relationship between the sample data and the membership of different classes, normal or anomaly, and use the fuzziness to separate the sample data of the overlapping parts of the classes from the data of the other parts. AdaBoost, an ensemble learning approach, is used to detect overlapping data. Compared with machine learning algorithms, ensemble learning can better classify the data of the overlapping parts, so as to achieve the purpose of detecting the anomalies of the system logs. We also discussed the possible impact of different voting methods on ensemble learning results. Experimental results show that our model can be effectively applied in a variety of basic algorithms, and the results of each measure have been improved.
Log files are usually semistructured files that record the historical operation information of systems or devices. Researchers often find anomalies by analyzing logs, so as to identify system operation faults and cyberattacks. Traditional classification-based methods, especially deep learning methods, can effectively solve the problem of static log anomaly detection. However, when addressing dynamic unstable logs caused by concept drift and noise, the performance of those methods decreased significantly, and false positives are prone to occur. Retraining model is a choice to solve the log instability problem, but this will greatly increase the computational complexity for deep learning models. The log-based conformal anomaly detection (LogCAD) builds a confidence evaluation mechanism for multiple labels, which can achieve good detection results by making collaborative decisions based on multiple weak classifiers without deep learning. Moreover, LogCAD can be easily extended to dynamic unstable logs. It incrementally updates the trained model with conformal detection results of new samples. Experimental results show that LogCAD can achieve excellent detection results for both dynamic unstable logs and static stable logs. Compared with LogRobust and other deep learning models, it has higher efficiency and wider application scope.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.