Incorporating Covariates Into Integrated Factor Analysis of Multi-View Data

Li, Gen; Jung, Sungkyu

doi:10.1111/biom.12698

Cited by 32 publications

(29 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of the proposed method depends heavily on the choice of tuning parameters. In the literature, there are several approaches to select ranks in the context of vertical integration, including permutation testing (Lock et al ., ), BIC (O'Connell and Lock, ), and cross‐validation (Li and Jung, ). In our context, the issue of rank selection is analogous to selecting the tuning parameters

λ_{i j}

.…”

Section: Methodsmentioning

confidence: 99%

“…In the context of vertical integration, the joint and individual scores

boldV

and

V_{i}

have been applied to risk prediction (Kaplan and Lock, ) and clustering (Hellton and Thoresen, ) for high‐dimensional data. Several related techniques, such as AJIVE (Feng et al ., ) and SLIDE (Gaynanova and Li, ), have been proposed (Zhou et al ., ), as well as extensions that allow the adjustment of covariates (Li and Jung, ) or accommodate heterogeneity in the distributional assumptions for different sources (Li and Gaynanova, ; Zhu et al ., ).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Integrative factorization of bidimensionally linked matrices

Park

Lock

2019

Biometrics

View full text Add to dashboard Cite

Advances in molecular “omics” technologies have motivated new methodologies for the integration of multiple sources of high‐content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are increasingly common in large‐scale biomedical studies. In this paper, we propose bidimensional integrative factorization (BIDIFAC) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes data into (a) globally shared, (b) row‐shared, (c) column‐shared, and (d) single‐matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation, we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from the random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (messenger RNA and microRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from the Cancer Genome Atlas. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.

show abstract

λ_{i j}

.…”

Section: Methodsmentioning

confidence: 99%

“…In the context of vertical integration, the joint and individual scores

boldV

and

V_{i}

Section: Introductionmentioning

confidence: 99%

Integrative factorization of bidimensionally linked matrices

Park

Lock

2019

Biometrics

View full text Add to dashboard Cite

show abstract

“…This approach aims to reduce the dimensions of image data in a biologically meaningful way, increasing the statistical power and offering comprehensive results about the brain structure (Faria, Liang, Miller, & Mori, 2017; Miller et al, 2013; Mori, Oishi, Faria, & Miller, 2013). We combined this approach with supervised integrated factor analysis (SIFA) (Li & Jung, 2017) to examine multiple MRI features (volume, DTI indices, rs‐fMRI) in the whole brain of FEP participants. We also accessed whether this multimodal approach would be efficient on classification of participants in subgroups of individuals with schizophrenia and schizoaffective disorder (S‐FEP) and individuals with bipolar disorder and major depressive disorder with psychotic features (M‐FEP).…”

Section: Introductionmentioning

confidence: 99%

MultimodalMRIassessment for first episode psychosis: A major change in the thalamus and an efficient stratification of a subgroup

Faria

Zhao

et al. 2020

Human Brain Mapping

View full text Add to dashboard Cite

Multi‐institutional brain imaging studies have emerged to resolve conflicting results among individual studies. However, adjusting multiple variables at the technical and cohort levels is challenging. Therefore, it is important to explore approaches that provide meaningful results from relatively small samples at institutional levels. We studied 87 first episode psychosis (FEP) patients and 62 healthy subjects by combining supervised integrated factor analysis (SIFA) with a novel pipeline for automated structure‐based analysis, an efficient and comprehensive method for dimensional data reduction that our group recently established. We integrated multiple MRI features (volume, DTI indices, resting state fMRI—rsfMRI) in the whole brain of each participant in an unbiased manner. The automated structure‐based analysis showed widespread DTI abnormalities in FEP and rs‐fMRI differences between FEP and healthy subjects mostly centered in thalamus. The combination of multiple modalities with SIFA was more efficient than the use of single modalities to stratify a subgroup of FEP (individuals with schizophrenia or schizoaffective disorder) that had more robust deficits from the overall FEP group. The information from multiple MRI modalities and analytical methods highlighted the thalamus as significantly abnormal in FEP. This study serves as a proof‐of‐concept for the potential of this methodology to reveal disease underpins and to stratify populations into more homogeneous sub‐groups.

show abstract

“…Some extensions of PCA and factor models that incorporate structures in the data (Jenatton et al, 2010;Allen et al, 2014;Lock et al, 2013;Li and Jung, 2017) have a potential to be cast into a GEP, by e.g. formulating the B matrix according to the structure given a priori.…”

Section: S11 Linear Dimension Reductionmentioning

confidence: 99%

Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem

Jung

Ahn

Jeon

2019

Journal of Computational and Graphical Statistics

Self Cite

View full text Add to dashboard Cite

We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEP). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR. Supplementary materials are available online. et al. (2015) later proposed several approximate solutions of (3). Recently, Tan et al. (2016) proposed to solve (2), by truncating the steepest ascent iterates in maximizing the Rayleigh coefficient u → u T Au/u T Bu. Gaynanova et al. (2017) pointed out a fundamental difference between the penalized and constrained optimizations for sparse GEP, similar to (2) and (3) but with 1 -norm. Safo et al. (2018) proposed to estimate u via minimizing u 1 subject to a constraint Aũ −λBu ∞ ≤ ρ, where (λ,ũ) is the non-sparse solution of (1). As it is evident in Sriperumbudur et al. (2011), Song et al. (2015), and Tan et al. (2016) who limit themselves for solving only one eigen-pair, we are unclear how (2) or (3) generalizes to simultaneously solving for multiple eigenvectors, u 1 , . . . , u d . When multiple eigenvectors are needed, as is typical in practice, these methods are not readily applicable, at least not without a clever modification. Our algorithm is designed to estimate u 1 , . . . , u d altogether, and works well when d > 1.Han and Clemmensen (2016) assumed B to be positive definite, and transformed the GEP to a regular eigen-decomposition of B −1 A (or B −1/2 AB −1/2 ) while applying an 1penalty to achieve sparsity. However, their method is not directly applicable to the largep-small-n-case, due to the numerically unstable inverse of the large matrix B. They used alternating direction method of multipliers for optimization, which causes their method to be computationally expensive. Chen et al. (2010) proposed to solve sufficient dimension reduction (SDR) problems by maximizing trace(U T AU) − ρ λ (U), for U ∈ R p×d satisfying U T BU = I d , in which the penalty function ρ λ enforces coordinate-wise sparsity (10). While Chen et al. (2010)'sformulation is similar to our Fast POI with (10), their computation is much slower than any of our proposed algorithms, perha...

show abstract

Incorporating Covariates Into Integrated Factor Analysis of Multi-View Data

Cited by 32 publications

References 27 publications

Integrative factorization of bidimensionally linked matrices

Integrative factorization of bidimensionally linked matrices

MultimodalMRIassessment for first episode psychosis: A major change in the thalamus and an efficient stratification of a subgroup

Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem

Contact Info

Product

Resources

About