2019
DOI: 10.1073/pnas.1810420116
|View full text |Cite
|
Sign up to set email alerts
|

A modern maximum-likelihood theory for high-dimensional logistic regression

Abstract: Students in statistics or data science usually learn early on that when the sample size n is large relative to the number of variables p, fitting a logistic model by the method of maximum likelihood produces estimates that are consistent and that there are well-known formulas that quantify the variability of these estimates which are used for the purpose of statistical inference. We are often told that these calculations are approximately valid if we have 5 to 10 observations per unknown parameter. This paper … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
187
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 196 publications
(194 citation statements)
references
References 51 publications
(105 reference statements)
7
187
0
Order By: Relevance
“…We set N B =2×10 5 , B =20 and nb=104, and simulate p ‐element vectors of covariates from boldxinormalIIDNfalse(0,NB1Ipfalse). Following Sur and Candés (), to guarantee the existence of the MLE in such high dimensional settings, we generate the true values of β0 entrywise IID from N(10,900) under p =1000 and from N(10,300) under p =2500. The same criteria are used in the subsequent assessment and comparisons.…”
Section: Simulation Experimentsmentioning
confidence: 99%
“…We set N B =2×10 5 , B =20 and nb=104, and simulate p ‐element vectors of covariates from boldxinormalIIDNfalse(0,NB1Ipfalse). Following Sur and Candés (), to guarantee the existence of the MLE in such high dimensional settings, we generate the true values of β0 entrywise IID from N(10,900) under p =1000 and from N(10,300) under p =2500. The same criteria are used in the subsequent assessment and comparisons.…”
Section: Simulation Experimentsmentioning
confidence: 99%
“…In the hierarchical case though, this remains unclear. In addition, the p-values from multi-sample splitting, as used in our procedure and software, might be unreliable: it is challenging, in particular for logistic regression (Sur and Candès, 2018), to come up with reliable and powerful p-values for testing single or groups of regression coefficients which are reliable and powerful in high-dimensional settings. (3) The role of the hierarchy is an issue of power, as long as we assume fixed design and a correct model specification.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, our paper is closely related to [7], [28], in which the authors study the high-dimensional performance of maximum-likelihood (ML) estimation for the logistic model. The ML estimator is a special case of (1) but their measurement model differs from the one considered in this paper.…”
Section: Prior Workmentioning
confidence: 99%