Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based on F-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.
With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.