The development of single-cell RNA-sequencing (scRNA-seq) technology has enabled the measurement of gene expression in individual cells. This provides an unprecedented opportunity to explore the biological mechanisms at the cellular level. However, existing scRNA-seq analysis methods are susceptible to noise and outliers or ignore the manifold structure inherent in the data. In this paper, a novel method called Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) is proposed to alleviate the above problem. Specifically, we employ the Cauchy loss function (CLF) instead of the conventional norm constraints in the noise matrix of CNLLRR, which will enhance the robustness of the method. In addition, graph regularization term is applied to the objective function, which can capture the paired geometric relationships between cells. Then, alternating direction method of multipliers (ADMM) is adopted to solve the optimization problem of CNLLRR. Finally, extensive experiments on scRNA-seq data reveal that the proposed CNLLRR method outperforms other state-of-the-art methods for cell clustering, cell visualization and prioritization of gene markers. CNLLRR contributes to understand the heterogeneity between cell populations in complex biological systems.
Author summaryAnalysis of single-cell data can help to further study the heterogeneity and complexity of cell populations. The current analysis methods are mainly to learn the similarity between cells and cells. Then they use the clustering algorithm to perform cell clustering or downstream analysis on the obtained similarity matrix. Therefore, constructing accurate cell-to-cell similarity is crucial for single-cell data analysis. In this paper, we design a novel Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) method to get a better similarity matrix. Specifically, Cauchy loss function (CLF) constraint is applied to punish noise matrix, which will improve the robustness of CNLLRR to noise and outliers. Moreover, graph regularization term is applied to the objective function, which will effectively encode the October 17, 20191/17 local manifold information of the data. Further, these will guarantee the quality of the cell-to-cell similarity matrix learned. Finally, single-cell data analysis experiments show that our method is superior to other representative methods. 2 been generated due to the development of next-generation sequencing technologies [1]. 3 At the same time, scRNA-seq data contain a wealth of information on biological 4 function and gene regulation. Analysis and research on these information pave us a way 5 to observe individual cells unprecedentedly [2]. It thus provides us possibilities to 6 explore the heterogeneity and complexity of cell population. It also offers us an 7 unprecedented opportunity to learn the biological mechanisms and functional diversity 8 at the cellular level. 9 With the help of scRNA sequencing, identification of subpopulation of cells [3] is 10 now possible. The identification can be c...