2022
DOI: 10.1093/bioadv/vbac026
|View full text |Cite
|
Sign up to set email alerts
|

Federated horizontally partitioned principal component analysis for biomedical applications

Abstract: Motivation Federated learning enables privacy preserving machine learning in the medical domain because the sensitive patient data remains with the owner and only parameters are exchanged between the data holders. The federated scenario introduces specific challenges related to the decentralized nature of the data, such as batch effects and differences in study population between the sites. Here, we investigate the challenges of moving classical analysis methods to the federated domain, speci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 40 publications
0
8
0
Order By: Relevance
“…To mimic the situation where genetic testing companies from different parts of the world collaboratively predict ancestry, we split the 1000 Genomes Project data into 5 isolated nodes based on sample superpopulation (African, Native American, East Asian, European, Southern Asian), thus getting high internode heterogeneity. After a standard QC, we reduced dimensionality by applying federated PCA [40] to pruned SNPs and then trained local, federated and centralized multilayer perceptrons (MLPs) of identical architecture. The experiment setup is visualized in Figure 3 and described in more detail in the Methods section.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…To mimic the situation where genetic testing companies from different parts of the world collaboratively predict ancestry, we split the 1000 Genomes Project data into 5 isolated nodes based on sample superpopulation (African, Native American, East Asian, European, Southern Asian), thus getting high internode heterogeneity. After a standard QC, we reduced dimensionality by applying federated PCA [40] to pruned SNPs and then trained local, federated and centralized multilayer perceptrons (MLPs) of identical architecture. The experiment setup is visualized in Figure 3 and described in more detail in the Methods section.…”
Section: Resultsmentioning
confidence: 99%
“…A fully federated solution requires all data to be prepared in a federated manner as well. In the case of ancestry prediction, dimensionality reduction is usually conducted via PCA, thus, we first pruned SNPs as displayed in Figure 3 to decrease computational load and then utilized federated PCA using the P-stack algorithm as described in [40]. The amount of communication used in federated PCA linearly depends on the number of input SNPs which affects the model accuracy.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…A high dimensional dataset is projected using PCA into an eigenspace that constitutes the direction of the largest variation illustrated by principal components. There are various drawbacks when using PCA, including the existence of abnormality that can lead to a recalculation of the PCA and result in unnecessary information disclosure [ 24 ]. Other than PCA, t-distributed stochastic neighbour embedding (t-SNE) [ 25 ] is widely used in the field of bioinformatics [ 26 ].…”
Section: Introductionmentioning
confidence: 99%
“…A high dimensional dataset are projected using PCA into an eigenspace that constitute the direction of largest variation illustrated by principal components. There are various drawbacks when using PCA including the existence of abnormality can lead to a recalculation of the PCA and result in unnecessary information disclosure [22]. Other than PCA, t-distributed stochastic neighbour embedding (t-SNE) [23] is widely used in the field of bioinformatics [24].…”
Section: Introductionmentioning
confidence: 99%