Abstract. High utility itemsets mining extends frequent pattern mining to discover itemsets in a transaction database with utility values above a given threshold. However, mining high utility itemsets presents a greater challenge than frequent itemset mining, since high utility itemsets lack the anti-monotone property of frequent itemsets. Transaction Weighted Utility (TWU) proposed recently by researchers has anti-monotone property, but it is an overestimate of itemset utility and therefore leads to a larger search space. We propose an algorithm that uses TWU with pattern growth based on a compact utility pattern tree data structure. Our algorithm implements a parallel projection scheme to use disk storage when the main memory is inadequate for dealing with large datasets. Experimental evaluation shows that our algorithm is more efficient compared to previous algorithms and can mine larger datasets of both dense and sparse data containing long patterns.
Abstract:The susceptibility of complex diseases are characterised by numerous genetic, lifestyle, and environmental causes individually or due to their interaction effects. The recent explosion in detecting genetic interacting factors is increasingly revealing the underlying biological networks behind complex diseases. Several computational methods are explored to discover interacting polymorphisms among unlinked loci. However, there has been no significant breakthrough towards solving this problem because of biomolecular complexities and computational limitations. Our previous research trained a deep multilayered feedforward neural network to predict two-locus polymorphisms due to interactions in genome-wide data. The performance of the method was studied on numerous simulated datasets and a published genomewide dataset. In this manuscript, the performance of the trained multilayer neural network is validated by varying the parameters of the models under various scenarios. Furthermore, the observations of the previous method are confirmed in this study by evaluating on a real dataset. The experimental findings on a real dataset show significant rise in the prediction accuracy over other conventional techniques. The result shows highly ranked interacting two-locus polymorphisms, which may be associated with susceptibility for the development of breast cancer.
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.