For the outlier problem in linear regression models, the Student-t linear regression model is one of the common methods for robust modeling and is widely adopted in the literature. However, most of them applies it without careful theoretical consideration. This study provides the practically useful and quite simple conditions to ensure that the Student-t linear regression model is robust against an outlier in the y-direction using regular variation theory.
Abstract. In large discrete data sets which requires classification into signal and noise components, the distribution of the signal is often very bumpy and does not follow a standard distribution. Therefore the signal distribution is further modelled as a mixture of component distributions.However, when the signal component is modelled as a mixture of distributions, we are faced with the challenges of justifying the number of components and the label switching problem (caused by multi-modality of the likelihood function). To circumvent these challenges, we propose a non-parametric structure for the signal component. This new method is more efficient in terms of precise estimates and better classifications. We demonstrate the efficacy of the methodology using a ChIPsequencing data set.
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many application areas require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component, therefore the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally challenging due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller classification error. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.