Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of “suffix to tag” mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%.
A key enzyme in human immunodeficiency virus type 1 (HIV-1) life cycle, integrase (IN) aids the integration of viral DNA into the host DNA, which has become an ideal target for the development of anti-HIV drugs. A total of 1785 potential HIV-1 IN inhibitors were collected from the databases of ChEMBL, Binding Database, DrugBank, and PubMed, as well as from 40 references. The database was divided into the training set and test set by random sampling. By exploring the correlation between molecular descriptors and inhibitory activity, it is found that the classification and specific activity data of inhibitors can be more accurately predicted by the combination of molecular descriptors and molecular fingerprints. The calculation of molecular fingerprint descriptor provides the additional substructure information to improve the prediction ability. Based on the training set, two machine learning methods, the recursive partition (RP) and naive Bayes (NB) models, were used to build the classifiers of HIV-1 IN inhibitors. Through the test set verification, the RP technique accurately predicted 82.5% inhibitors and 86.3% noninhibitors. The NB model predicted 88.3% inhibitors and 87.2% noninhibitors with correlation coefficient of 85.2%. The results show that the prediction performance of NB model is slightly better than that of RP, and the key molecular segments are also obtained. Additionally, CoMFA and CoMSIA models with good activity prediction ability both were constructed by exploring the structure-activity relationship, which is helpful for the design and optimization of HIV-1 IN inhibitors.
Many real-world systems can be expressed in temporal networks with nodes playing different roles in structure and function, and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities and so on. However, it is rather difficult to identify critical nodes, because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special graph convolutional and long short-term memory network (LSTM) is proposed to identify nodes with the best spreading ability. The special graph convolutional network can embed nodes in each sequential weighted snapshot and LSTM is used to predict the future importance of timing-embedded features. The effectiveness of the approach is evaluated by a weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall τ coefficient and top k hit rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.