In this work, we deal with correlated under-reported data through INAR(1)-hidden Markov chain models. These models are very flexible and can be identified through its autocorrelation function, which has a very simple form. A naïve method of parameter estimation is proposed, jointly with the maximum likelihood method based on a revised version of the forward algorithm. The most-probable unobserved time series is reconstructed by means of the Viterbi algorithm. Several examples of application in the field of public health are discussed illustrating the utility of the models. Copyright © 2016 John Wiley & Sons, Ltd.
Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.
Underreporting in gender-based violence data is a worldwide problem leading to the underestimation of the magnitude of this social and public health concern. This problem deteriorates the data quality, providing poor and biased results that lead society to misunderstand the actual scope of this domestic violence issue. The present work proposes time series models for underreported counts based on a latent integer autoregressive of order 1 time series with Poisson distributed innovations and a latent underreporting binary state, that is, a first-order Markov chain. Relevant theoretical properties of the models are derived, and the moment-based and maximum-based methods are presented for parameter estimation. The new time series models are applied to the quarterly complaints of domestic violence against women recorded in some judicial districts of Galicia (Spain) between 2007 and 2017. The models allow quantifying the degree of underreporting. A comprehensive discussion is presented, studying how the frequency and intensity of underreporting in this public health concern are related to some interesting socioeconomic and health indicators of the provinces of Galicia (Spain). KEYWORDSinteger autoregressive models, intimate partner violence, public health, state-dependent underreporting, underrecorded data 4404
In this article we present a new INteger-valued AutoRegressive (INAR) model with the aim of extracting baseline patterns of cattle fallen stock registered over an 5-year period at a local scale. We introduce HINAR as a generalization of the classical Poisson-based INAR models whose innovations follow a Hermite distribution. In order to assess trends and seasonality in these time series, we fit different models with time-dependent parameters by specifying proper functions. Using real world examples, we illustrate how to estimate parameters by maximum likelihood and validate the fitted models. We also show a detailed method to forecast. Our proposed model supposes a good solution for studying discrete time series when the counts have many zeros, low counts and moderate overdispersion. This model has been applied to the analysis of fallen cattle registered at a local scale as part of the development of a veterinary syndromic surveillance system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.