When assessing interview response quality to identify potentially low-quality interviews, both numerical and categorical response quality indicators (mixed indicators) are usually available. However, research on how to use them simultaneously is very rare. In the current article, we extend the application of conventional multivariate control charts to include response quality indicators that are of a mixed type. We analyze data from the eighth round of the European Social Survey in Belgium, characterized by six numerical and two categorical response quality indicators. First, we employ a principal component analysis mix procedure (PCA Mix) to transform the mixed quality indicators into principal components. The principal component scores are subsequently used to construct a Hotelling T2 statistic. To deal with the non-multivariate normal nature of the principal component scores obtained from the PCA Mix, a nonparametric bootstrap method is then applied to calculate the control limit for the T2 statistic. Second, we suggest tools to interpret an identified outlier in terms of finding the responsible original indicator(s). Third, we present a cyclic procedure for determining the “in-control” data, by iteratively removing the outliers until the process is considered as being in control. Lastly, we identify the most important indicators that discriminate the outliers from the in-control data. Our results imply that multivariate control charts based on relevant projection tools such as PCA Mix in combination with the bootstrap technique have great potential for use in evaluating interview response quality and identifying outliers.
Despite general agreement regarding the usefulness of statistical process control (SPC) tools for monitoring paradata, using SPC from an early phase of the survey fieldwork is rather rare. This study focuses on one type of paradata-interview duration-to fill this void. First, we establish a procedure based on the idea of enabling fieldwork monitoring for the seventh round of the European Social Survey in Belgium from its start. The impact of respondent characteristics on interview duration is controlled for by multiple regression. Moreover, we simulate the real conditions of an ongoing survey data collection process by cumulating data and repeating the identification of problematic interviews each week, on the basis that "new" data are available. Second, for each interview we record and track the results with regard to whether or not it is problematic over the fieldwork period, to examine the consistency of our findings. We find that as more data becomes available, the results concerning whether an interview is problematic changes in only 0.3% of the cases. Out of the 27 interviews identified as problematic when all information was available, 25 were immediately identified once relevant information was available. Overall, these findings suggest that SPC tools are reliable and efficient in a survey context, and accordingly have great potential for allowing survey practitioners to focus on the interviews for which further examination is needed immediately, rather than when the data collection has been completed.
Proteins are the fundamental biological macromolecules which underline practically all biological activities. Protein–protein interactions (PPIs), as they are known, are how proteins interact with other proteins in their environment to perform biological functions. Understanding PPIs reveals how cells behave and operate, such as the antigen recognition and signal transduction in the immune system. In the past decades, many computational methods have been developed to predict PPIs automatically, requiring less time and resources than experimental techniques. In this paper, we present a comparative study of various graph neural networks for protein–protein interaction prediction. Five network models are analyzed and compared, including neural networks (NN), graph convolutional neural networks (GCN), graph attention networks (GAT), hyperbolic neural networks (HNN), and hyperbolic graph convolutions (HGCN). By utilizing the protein sequence information, all of these models can predict the interaction between proteins. Fourteen PPI datasets are extracted and utilized to compare the prediction performance of all these methods. The experimental results show that hyperbolic graph neural networks tend to have a better performance than the other methods on the protein-related datasets.
When monitoring industrial processes, a Statistical Process Control tool, such as a multivariate Hotelling T 2 chart is frequently used to evaluate multiple quality characteristics. However, research into the use of T 2 charts for survey fieldwork–essentially a production process in which data sets collected by means of interviews are produced–has been scant to date. In this study, using data from the eighth round of the European Social Survey in Belgium, we present a procedure for simultaneously monitoring six response quality indicators and identifying outliers: interviews with anomalous results. The procedure integrates Kernel Density Estimation (KDE) with a T 2 chart, so that historical “in-control” data or reference to the assumption of a parametric distribution of the indicators is not required. In total, 75 outliers (4.25%) are iteratively removed, resulting in an in-control data set containing 1,691 interviews. The outliers are mainly characterized by having longer sequences of identical answers, a greater number of extreme answers, and against expectation, a lower item nonresponse rate. The procedure is validated by means of ten-fold cross-validation and comparison with the minimum covariance determinant algorithm as the criterion. By providing a method of obtaining in-control data, the present findings go some way toward a way to monitor response quality, identify problems, and provide rapid feedbacks during survey fieldwork.
Multivariate statistical process control (MSPC) was developed for the monitoring of variables that are either all numerical or all categorical. In the present paper, we describe a nonparametric control scheme that can be used to monitor a mixture of numerical and categorical variables simultaneously. It integrates Principal Component Analysis Mix (PCA Mix), a multivariate statistical tool, with the conventional Hotelling T 2 chart. To estimate the control limit for the PCA Mix based T 2 statistic, two nonparametric approaches -kernel density estimation (KDE) and bootstrap -are employed, because of the unknown nature of the underlying distribution. The simulation results demonstrate that with an appropriate number of principal components, both bootstrap and KDE exhibit convincing performance in terms of generating the same, or nearly the same, number of false alarms (ARL 0 ) as expected, and being able to detect process shifts efficiently (ARL 1 ). Compared with bootstrap, KDE is shown to work better with small sample sizes (n < 800) and to be slightly more sensitive to
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.