2017
DOI: 10.1186/s12864-017-4112-9
|View full text |Cite
|
Sign up to set email alerts
|

Determining the optimal number of independent components for reproducible transcriptomic data analysis

Abstract: BackgroundIndependent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.ResultsHere we address the question of optimizing the numbe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
58
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 60 publications
(59 citation statements)
references
References 25 publications
1
58
0
Order By: Relevance
“…Additionally, the CysB i-modulons demonstrated that i-modulons may represent the effects of multiple regulators, when the activities 400 of the regulators are highly correlated across the measured conditions. However, adding new data or increasing the dimensionality of the decomposition can decouple the regulators, splitting the i-modulon into its biological parts 30,44 .…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, the CysB i-modulons demonstrated that i-modulons may represent the effects of multiple regulators, when the activities 400 of the regulators are highly correlated across the measured conditions. However, adding new data or increasing the dimensionality of the decomposition can decouple the regulators, splitting the i-modulon into its biological parts 30,44 .…”
Section: Discussionmentioning
confidence: 99%
“…ICA was applied as previously described (Biton et al, 2014), using stabilization, with an additional procedure for determining the optimal number of independent components (Kairov et al, 2017). In the ICA decomposition X = AS, X is the gene expression (sample vs. gene) matrix, A is the (sample vs. component) matrix describing the loadings of the independent components, and S is the (component vs. gene matrix) describing the weights (projections) of the genes in the components.…”
Section: Exploratory Analysis Of Scrna-seq Datamentioning
confidence: 99%
“…When applying ICA, the target dimension (number of components) is fixed before numerical optimisation and the algorithm can converge to different projections that share only a subset of components. In keeping with the idea developed by [20], we choose the target dimension K = 40 after examining the stability of the components and only stable components were used. Here, we used averagelink clustering based on absolute Pearson correlation coefficient (r) between columns of the source matrix (dimension 1,512×K) and a cut-off |r| ≥ 0.8 to compare components between runs.…”
Section: Building Covariates For Motif Discoverymentioning
confidence: 99%
“…Two main novelties of the proposed model are to allow overlaps between motif occurrences and to incorporate covariates summarising expression profiles into the probability of occurrence in a given promoter region. Covariates can correspond to the positions of the genes on an axis such as obtained by PCA [18] or ICA [19,20] but we also show how to use positions in a hierarchical clustering trees [21,8]. All the parameters are estimated in a Bayesian framework using a dedicated trans-dimensional MCMC algorithm.…”
Section: Introductionmentioning
confidence: 99%