Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets.
In computer vision, rotation equivariance and translation invariance are properties of a representation that preserve the geometric structure of a transformed input. These properties are achieved in Convolutional Neural Networks (CNNs) through data augmentation. However, achieving these properties remains a challenge. This is because CNNs are not equivariance under rotation. In this study, a novel deep neural network architecture combining a group convolutional neural network (G-CNN) built with a special Euclidean (SE2) motion group and discrete cosine transform (DCT) is proposed. The former is based on the group theory and uses SE2 to guarantee equivariance in 2D images, whereas the latter is used to encode and parameterize the model space. To restore and preserve the equivariance property of the transformed and convoluted images, the DCT was used as a rotation-invariant module. These combined techniques are employed to improve breast cancer classification and data efficiency in the CNNs processing pipeline. The developed model is tested on the rotated MNIST datasets to assess its performance. Finally, the model is applied to mammography images and achieved a high computational performance and improved inference generation in breast cancer classification with an accuracy of 94.84%.
The challenge in classifying cancer may lead to inaccurate classification of cancers, especially sarcoma cancer since it consists of rare types of cancer. It is hard for the clinician to confirm the patient's condition because an accurate diagnosis can only be made by the specialist pathology. Therefore, instead of a single omics is used to identify the disease marker, an approach of integrating these omics to represent multi-omics brings more advantages in detecting and presenting the phenotype of the cancers. Nowadays, the advancement of computational models especially deep learning offered promising approaches in solving high-level omics of data with faster processing speed. Hence, the purpose of this study is to classify cancer and non-cancerous patients using Stacked Denoising Autoencoder (SDAE) and One-dimensional Convolutional Neural Network (1D CNN) to evaluate which algorithm classifies better using high correlated multi-omics data. The study employed both computational models to fit multi-omics dataset. Sarcoma omics datasets used in this study was obtained from the Multi-Omics Cancer Benchmark TCGA Pre-processed Data of ACGT Ron Shamir Lab repository. From the results, the accuracy obtained for the SDAE was 50.93% and 52.78% for the 1D CNN. The result show 1D CNN model outperformed SDAE in classifying sarcoma cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.