Yexiong Lin scite author profile

et al. 2020

Front. Genet.

For precision medicine, there is an enormous need to understand the immune evasion mechanism of tumor development, especially when tumor heterogeneity significantly affects the effect of immunotherapy. Recognizing the subtypes of breast cancer based on the immune-related genes helps to understand the immune escape pathways dominated by different subtypes, so as to implement effective treatment measures for different subtypes. For that, we used non-negative matrix factorization and consistent clustering algorithm on The Cancer Genome Atlas RNA-seq breast cancer data and recognized 4 subtypes according to the curated immune-related genes. Then, we conducted differential expression analysis between each subtype of breast cancer and normal tissue of RNA-seq data from non-cancer individuals collected by the Genotype-Tissue Expression to find out subtype-related immune genes. After that, we carried out correlation analysis between copy number variants (CNV) and mRNA of immune genes and investigated the regulatory mechanism of the immune genes, which cannot be explained by CNV based on ATAC-seq data. The experimental results reveal that CDH1 and PVRL2 are potential for immune evasion in all 4 subgroups. The expression variations of CDH1 can be mainly explained by its CNV, while the expression variation of PVRL2 is more likely regulated by transcript factors.

Analyzing Association Between Expression Quantitative Trait and CNV for Breast Cancer Based on Gene Interaction Network Clustering and Group Sparse Learning

Chen

et al. 2022

CBIO

Aims: The occurrence and development of tumor is accompanied by the change of pathogenic gene expression. Tumor cells avoid the damage of immune cells by regulating the expression of immune related genes. Background: Tracing the causes of gene expression variation is helpful to understand tumor evolution and metastasis. Objective: Current gene expression variation explanation methods are confronted with several main challenges: low explanation power, insufficient prediction accuracy, and lack of biological meaning. Method: In this study, we propose a novel method to analyze the mRNA expression variations of breast cancers risk genes. Firstly, we collected some high-confidence risk genes related to breast cancer and then designed a rank-based method to preprocess the breast cancers copy number variation (CNV) and mRNA data. Secondly, to elevate the biological meaning and narrow down the combinatorial space, we introduced a prior gene interaction network and applied a network clustering algorithm to generate high density subnetworks. Lastly, to describe the interlinked structure within and between subnetworks and target genes mRNA expression, we proposed a group sparse learning model to identify CNVs for pathogenic genes expression variations. Result: The performance of the proposed method is evaluated by both significantly improved predication accuracy and biological meaning of pathway enrichment analysis. Conclusion: The experimental results show that our method has practical significance

Do We Need to Penalize Variance of Losses for Learning with Label Noise?

Lin¹,

Yu²,

Du³

et al. 2022

Preprint

Algorithms which minimize the averaged loss have been widely designed for dealing with noisy labels. Intuitively, when there is a nite training sample, penalizing the variance of losses will improve the stability and generalization of the algorithms. Interestingly, we found that the variance should be increased for the problem of learning with noisy labels. Specically, increasing the variance will boost the memorization e ects and reduce the harmfulness of incorrect labels. By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses and be plugged in many existing algorithms. Empirically, the proposed method by increasing the variance of losses signi cantly improves the generalization ability of baselines on both synthetic and real-world datasets.

An epistasis and heterogeneity analysis method based on maximum correlation and maximum consistence criteria

Chen¹,

et al. 2021

MBE

<abstract> <p>Tumor heterogeneity significantly increases the difficulty of tumor treatment. The same drugs and treatment methods have different effects on different tumor subtypes. Therefore, tumor heterogeneity is one of the main sources of poor prognosis, recurrence and metastasis. At present, there have been some computational methods to study tumor heterogeneity from the level of genome, transcriptome, and histology, but these methods still have certain limitations. In this study, we proposed an epistasis and heterogeneity analysis method based on genomic single nucleotide polymorphism (SNP) data. First of all, a maximum correlation and maximum consistence criteria was designed based on Bayesian network score <italic>K2</italic> and information entropy for evaluating genomic epistasis. As the number of SNPs increases, the epistasis combination space increases sharply, resulting in a combination explosion phenomenon. Therefore, we next use an improved genetic algorithm to search the SNP epistatic combination space for identifying potential feasible epistasis solutions. Multiple epistasis solutions represent different pathogenic gene combinations, which may lead to different tumor subtypes, that is, heterogeneity. Finally, the XGBoost classifier is trained with feature SNPs selected that constitute multiple sets of epistatic solutions to verify that considering tumor heterogeneity is beneficial to improve the accuracy of tumor subtype prediction. In order to demonstrate the effectiveness of our method, the power of multiple epistatic recognition and the accuracy of tumor subtype classification measures are evaluated. Extensive simulation results show that our method has better power and prediction accuracy than previous methods.</p> </abstract>

Authenticity verification on social data outsourcing

Chen

et al. 2021

Computers & Security