<p style='text-indent:20px;'>Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an <inline-formula><tex-math id="M1">\begin{document}$ L_{2} $\end{document}</tex-math></inline-formula> penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.</p>
Topological data analysis (TDA) allows us to explore the topological features of a dataset. Among topological features, lower dimensional ones have recently drawn the attention of practitioners in mathematics and statistics due to their potential to aid the discovery of low dimensional structure in a data set. However, lower dimensional features are usually challenging to detect from a probabilistic perspective.In this paper, lower dimensional topological features occurring as zero-density regions of density functions are introduced and thoroughly investigated. Specifically, we consider sequences of coverings for the support of a density function in which the coverings are comprised of balls with shrinking radii. We show that, when these coverings satisfy certain sufficient conditions as the sample size goes to infinity, we can detect lower dimensional, zero-density regions with increasingly higher probability while guarding against false detection. We supplement the theoretical developments with the discussion of simulated experiments that elucidate the behavior of the methodology for different choices of the tuning parameters that govern the construction of the covering sequences and characterize the asymptotic results.
DNA methylation is an epigenetic change that is not only important in normal cell development, but also plays a significant role in human health and disease. Therefore, studies of DNA methylation have been actively pursued to clarify the precise role of this modification in disease etiology and its potential as a biomarker of disease. One key issue in analyzing DNA methylation data is the detection of significant differences in methylation levels between diseased individuals and healthy controls. In recent years, molecular technology has been developed to produce bisulfite sequencing (BS-Seq) data, which provide single-base resolution. For such data, methylation counts at a single site follow a binomial distribution, the probability of which reflects the methylation level at this site. Traditional hypothesis-testing methods, such as Fisher's exact (FE) test, have been applied to detect differentially methylated cytosines (DMCs). Although the FE test is widely used, its "fixed margin" assumption has been called into question in such applications. Furthermore, biological variability between samples within a group cannot be accounted for in the FE test. Statistical tests that do not rely on such an assumption exist, including the computationally efficient Storer-Kim (SK) test. However, whether such methods outperform the FE test for detecting DMCs is unknown, with or without the presence of within-group variation. In this study, we compared the performance of several traditional hypothesis-testing methods from both statistical and biological perspectives based on simulated and real data as well as theoretical analyses. Our results show that the unconditional SK test consistently outperforms the conditional FE test for the detection of DMCs. This advantage is especially noteworthy in studies with limited sequencing depth.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.