2020
DOI: 10.1080/01621459.2020.1752219
|View full text |Cite
|
Sign up to set email alerts
|

Spherical Regression Under Mismatch Corruption With Application to Automated Knowledge Translation

Abstract: Motivated by a series of applications in data integration, language translation, bioinformatics, and computer vision, we consider spherical regression with two sets of unit-length vectors when the data are corrupted by a small fraction of mismatch in the response-predictor pairs. We propose a three-step algorithm in which we initialize the parameters by solving an orthogonal Procrustes problem to estimate a translation matrix W ignoring the mismatch. We then estimate a mapping matrix aiming to correct the mism… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 54 publications
1
24
0
Order By: Relevance
“…195 Recently, Shi et al developed a spherical regression-based method for handling heterogeneity in ICD code designation across different EHR systems. 196 Methodology in the data integration literature may also prove useful for addressing these challenges. 197 Future work may explore resampling-based methods to make studies more comparable in the presence of heterogeneity with respect to the sampling mechanism.…”
Section: Heterogeneity Between Biobanksmentioning
confidence: 99%
See 1 more Smart Citation
“…195 Recently, Shi et al developed a spherical regression-based method for handling heterogeneity in ICD code designation across different EHR systems. 196 Methodology in the data integration literature may also prove useful for addressing these challenges. 197 Future work may explore resampling-based methods to make studies more comparable in the presence of heterogeneity with respect to the sampling mechanism.…”
Section: Heterogeneity Between Biobanksmentioning
confidence: 99%
“…The large number of subjects and the large number of available adjustment factors in EHR data provide an opportunity to effectively address more refined questions such as the relationship between treatment and molecular subgroups of disease (inherently a question of interactions) directly, potentially allowing clinical heterogeneity to be handled directly through a redefinition of the quantity of interest . Recently, Shi et al developed a spherical regression‐based method for handling heterogeneity in ICD code designation across different EHR systems . Methodology in the data integration literature may also prove useful for addressing these challenges .…”
Section: Statistical Issues Related To Biobank Researchmentioning
confidence: 99%
“…In a recent study to understand public opinion about diseases, Huang et al identified articles about diseases and mapped them to phecodes [36]. Motivated by the difficulties in automatically translating diagnosis codes from EHRs, Shi et al used phecodes to map ICD-9-CM diagnosis codes from one health system to another [37]. Phecodes have also been applied to identify conditions for aggregation in phenotype risk scores, much as SNPs are aggregated as a genetic risk score to identify Mendelian diseases and determine pathogenicity of genetic variants [38].…”
Section: Discussionmentioning
confidence: 99%
“…implies failure. Conditions (11) and (12) thus match up to multiplicative factors. Next, we consider the rank-1 case.…”
Section: A Oracle Case: Known B *mentioning
confidence: 96%
“…In [7], a polynomialtime approximation algorithm is proposed, and lower bounds on the required snr for approximate signal recovery in the noisy case are shown; related results can be found in [8], [9]. The works [9]- [12] discuss both signal and permutation recovery if Π * only permutes a small fraction of the rows of the sensing matrix. An interesting variation of (2) in which Π * is an unknown selection matrix that selects a fraction measurements in an order-preserving fashion is studied in [13].…”
Section: A Related Workmentioning
confidence: 99%