Nowadays, unprecedented amounts of heterogeneous data collections are stored, processed and transmitted via the Internet. In data analysis one of the most important problems is to verify whether data observed or/and collected in time are genuine and stationary, i.e. the information sources did not change their characteristics. There is a variety of data types: texts, images, audio or video files or streams, metadata descriptions, thereby ordinary numbers. All of them changes in many ways. If the change happens the next question is what is the essence of this change and when and where the change has occurred. The main focus of this paper is detection of change and classification of its type. Many algorithms have been proposed to detect abnormalities and deviations in the data. In this paper we propose a new approach for abrupt changes detection based on the Parzen kernel estimation of the partial derivatives of the multivariate regression functions in presence of probabilistic noise. The proposed change detection algorithm is applied to oneand two-dimensional patterns to detect the abrupt changes.
This paper presents a neural network model for identifying non-human traffic to a web-site, which is significantly different from visits made by regular users. Such visits are undesirable from the point of view of the website owner as they are not human activity, and therefore do not bring any value, and, what is more, most often involve costs incurred in connection with the handling of advertising. They are made most often by dishonest publishers using special software (bots) to generate profits. Bots are also used in scraping, which is automatic scanning and downloading of website content, which actually is not in the interest of website authors. The model proposed in this work is learnt by data extracted directly from the web browser during website visits. This data is acquired by using a specially prepared JavaScript that monitors the behavior of the user or bot. The appearance of a bot on a website generates parameter values that are significantly different from those collected during typical visits made by human website users. It is not possible to learn more about the software controlling the bots and to know all the data generated by them. Therefore, this paper proposes a variational autoencoder (VAE) neural network model with modifications to detect the occurrence of abnormal parameter values that deviate from data obtained from human users’ Internet traffic. The algorithm works on the basis of a popular autoencoder method for detecting anomalies, however, a number of original improvements have been implemented. In the study we used authentic data extracted from several large online stores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.