. For (very) sparse nominal data, common goodness-of-fit tests usually fail. Alternative goodness-of-fit tests based on extended empirical Bayes approach and grouping are proposed and their consistency is proved. The performance of the tests is illustrated and compared with classical criteria by Monte Carlo simulations.
Simple conditions for the inconsistency of Pearson’s chi2 test in case of very sparse categorical data are given. The conditions illustrate the phenomenon of “reversed consistency”: the greater deviation from the null hypothesis the less power of the test.
Conditional logistic regression model is fitted to data of student academic performance. This enables one to compare difficulties of different examinations and to identify related factors.
In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed. Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ 2 approximations of goodnessof-fit statistics may be inaccurate and the table is said to be sparse. Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables. In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson's χ 2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist. An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite-sample behavior of the criteria is investigated via Monte Carlo simulations. The dissertation consists of four parts including Introduction, 4 chapters, General conclusions, References and Appendices. The introduction reveals the importance of the scientific problem, describes the purpose and tasks of the thesis, research methodology, scientific novelty, the practical significance of results The introduction ends in presenting the author's publications on the subject of the defended dissertation, offering the material of made presentations in conferences. In Chapter 1, an overview of the problem is presented and basic definitions are introduced. Chapter 2 demonstrates the inconsistency of classical tests in case of (very) sparse categorical data for both multinomial and Poisson sampling scheme. In Chapter 3, extended Bayes model is introduced. It provides a basis for smoothing and grouping of sparse (nominal) data. The consistency of criteria based on grouping is proved. Finite-sample behavior of the classical and proposed criteria is studied in Chapter 4. Details of the computer simulation results are given in Appendix. H 0-the null hypothesis; H 1-the alternative. Abbreviations LN RE-Large Number of Rare Events; M CM C-Markov chain Monte Carlo.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.