2021
DOI: 10.48550/arxiv.2106.00939
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Combining case-control studies for identifiability and efficiency improvement in logistic regression

Abstract: Can two separate case-control studies, one about Hepatitis disease and the other about Fibrosis, for example, be combined together? It would be hugely beneficial if two or more separately conducted case-control studies, even for entirely irrelevant purposes, can be merged together with a unified analysis that produce better statistical properties, e.g., more accurate estimation of parameters. In this paper, we show that, when using the popular logistic regression model, the combined/integrative analysis produc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 30 publications
(33 reference statements)
0
1
0
Order By: Relevance
“…Recall that for the classical logistic regression under case-control study, 24 the intercept term and the density function of X * , denoted by f (⋅), are not estimable, as the score function of the intercept term lies in the linear space spanned by the score function of f (⋅). The identifiability issue of the classical logistic regression under case-control studies was carefully studied by Tang et al 46 With the availability of the unlabeled samples in a semi-supervised setting, f (⋅) is always identifiable. We propose a two-stage procedure to estimate g(⋅): first, the unlabeled data are used to estimate the density function of X * ; then a likelihood-based estimator for g(⋅) based on B-spline is obtained at the second stage.…”
Section: Introductionmentioning
confidence: 99%
“…Recall that for the classical logistic regression under case-control study, 24 the intercept term and the density function of X * , denoted by f (⋅), are not estimable, as the score function of the intercept term lies in the linear space spanned by the score function of f (⋅). The identifiability issue of the classical logistic regression under case-control studies was carefully studied by Tang et al 46 With the availability of the unlabeled samples in a semi-supervised setting, f (⋅) is always identifiable. We propose a two-stage procedure to estimate g(⋅): first, the unlabeled data are used to estimate the density function of X * ; then a likelihood-based estimator for g(⋅) based on B-spline is obtained at the second stage.…”
Section: Introductionmentioning
confidence: 99%