Motivation: Accurately detecting tissue specificity (TS) in genes helps researchers understand tissue functions at the molecular level, and further identify disease mechanisms and discover tissue-specific therapeutic targets. The Genotype-Tissue Expression (GTEx) project (Consortium, 2015), and the Human Protein Atlas (HPA) project (Uhlén, et al., 2015) are two publicly available data resources, providing large-scale gene expressions across multiple tissue types. Multiple tissue comparisons, technical background noise and unknown variation factors make it challenging to accurately identify tissue specific gene expressions. Several methods worked on measuring the overall TS in gene expressions and classifying genes into tissue-enrichment categories. There still lacks a robust method to provide quantitative TS scores for each tissue.
Methods:We recognized that the key to quantify tissue specific gene expressions is to properly define a concept of expression population. We considered that inside the population, the sample expressions from various tissues are more or less balanced, and the outlier expressions outside the population may indicate tissue specificity. We then formulated the question to robustly estimate the population distribution. In a linear regression problem, we developed a novel data-adaptive robust estimation based on density-power-weight under unknown outlier distribution and non-vanishing outlier proportion (Wang, et al., 2019). In the question of quantifying TS, we focused on the Gaussian-population mixture model. We took into account gene heterogeneities and applied the robust data-adaptive procedure to estimate the population. With the robustly estimated population parameters, we constructed the AdaTiSS algorithm to obtain data-adaptive quantitative TS scores.Results: Our TS scores from the AdaTiSS algorithm achieve the goal that the TS scores are comparable across tissues and also across genes, which standardize gene expressions in terms of TS. Compared to the categorical TS method such as the HPA criterion, our method provides more information on the population fitting, and shows advantages in quantitatively analyzing tissue specific functions, making the biology functional analysis more precise. We also discuss some limitations and possible future work.