Summary
This paper explores the use of deep belief networks for authorship verification model applicable for continuous authentication (CA). The proposed approach uses Gaussian units in the visible layer to model real‐valued data on the basis of a Gaussian‐Bernoulli deep belief network. The lexical, syntactic, and application‐specific features are explored, leading to the proposal of a method to merge a pair of features into a single one. The CA is simulated by decomposing an online document into a sequence of short texts over which the CA decisions happen. The experimental evaluation of the proposed method uses block sizes of 140, 280, 500 characters, on the basis of the Twitter and Enron e‐mail corpuses. Promising results are obtained, which consist of an equal error rate varying from 8.21% to 16.73%. Using relatively smaller forgery samples, an equal error rate varying from 5.48% to 12.3% is also obtained for different block sizes.
Authorship verification using stylometry consists of identifying a user based on his writing style. In this paper, authorship verification is applied for continuous authentication using unstructured online text-based entry. An online document is decomposed into consecutive blocks of short texts over which (continuous) authentication decisions happen, discriminating between legitimate and impostor behaviors. We investigate blocks of texts with 140, 280 and 500 characters. The feature set includes traditional features such as lexical, syntactic, application specific features, and new features extracted from n-gram analysis. Furthermore, the proposed approach includes a strategy to circumvent issues related to unbalanced dataset, and uses Information Gain and Mutual Information as a feature selection strategy and Support Vector Machine (SVM) for classification. Experimental evaluation of the proposed approach based on the Enron email and Twitter corpuses yields very promising results consisting of an Equal Error Rate (EER) varying from 9.98% to 21.45%, for different block sizes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.