In data mining, classification is the way to splits the data into several dependent and independent regions and each region refer as a class. There are different kinds of classifier uses to accomplish classification task. Moreover classification is bounded in case of classifying of text documents. The motives of the work which a present in the article is to evaluate multiclass document classification and to learn achieve accuracy of classification in the case of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective than Flat classification. It also performs better in case of multi-label document classification. In contrast to retrospect we observe significant increase in the generation of data each day. And hence with the advent of smarter technologies, data is required to be classified and sorted before framing out decisions from it. There are so many techniques available for classifying documents into various categories or labels. Data mining is the process of non-trivial extraction of novel, implicit, and actionable knowledge from large data sets.
Islam. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license.
With the availability of the huge medical knowledge data on the Internet such as the human disease network, protein-protein interaction (PPI) network, and phenotypegene, gene-disease bipartite networks, it becomes practical to help doctors by suggesting plausible hereditary diseases for a set of clinical phenotypes. However, identifying candidate diseases that best explain a set of clinical phenotypes by considering various heterogeneous networks is still a challenging task. In this paper, we propose a new method for estimating a ranked list of plausible diseases by associating phenotypegene with gene-disease bipartite networks. Our approach is to count the frequency of all the paths from a phenotype to a disease through their associated causative genes, and link the phenotype to the disease with path frequency in a new phenotype-disease bipartite (PDB) network. After that, we generate the candidate weights for the edges of phenotypes with diseases in PDB network. We evaluate our proposed method in terms of Normalized Discounted Cumulative Gain (NDCG), and demonstrate that we outperform the previously known disease ranking method called Phenomizer.
With vast amounts of medical knowledge available on the Internet, it is becoming increasingly practical to help doctors in clinical diagnostics by suggesting plausible diseases predicted by applying data and text mining technologies. Recently, Genome-Wide Association Studies (GWAS) have proved useful as a method for exploring phenotypic associations with diseases. However, since genetic diseases are difficult to diagnose because of their low prevalence, large number, and broad diversity of symptoms, genetic disease patients are often misdiagnosed or experience long diagnostic delays. In this article, we propose a method for ranking genetic diseases for a set of clinical phenotypes. In this regard, we associate a phenotype-gene bipartite graph (PGBG) with a gene-disease bipartite graph (GDBG) by producing a phenotype-disease bipartite graph (PDBG), and we estimate the candidate weights of diseases. In our approach, all paths from a phenotype to a disease are explored by considering causative genes to assign a weight based on path frequency, and the phenotype is linked to the disease in a new PDBG. We introduce the Bidirectionally induced Importance Weight (BIW) prediction method to PDBG for approximating the weights of the edges of diseases with phenotypes by considering link information from both sides of the bipartite graph. The performance of our system is compared to that of other known related systems by estimating Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP), and Kendall's tau metrics. Further experiments are conducted with well-known TF·IDF, BM25, and Jenson-Shannon divergence as baselines. The result shows that our proposed method outperforms the known related tool Phenomizer in terms of NDCG@10, NDCG@20, MAP@10, and MAP@20; however, it performs worse than Phenomizer in terms of Kendall's tau-b metric at the top-10 ranks. It also turns out that our proposed method has overall better performance than the baseline methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.