This article is devoted to the semantic analysis of weakly structured information in the field of “Artificial intelligence and information security”. The methodology of this research included two stages and is based on the meta-analysis of existing studies. The received results allow development of further methodological recommendations on semi-structured information and artificial intelligence.
To increase the effectiveness of detecting fraudulent bank transactions, the structure of the system is proposed to analyze data of user environment in order to identify potential fraudulent activities. The system for collecting and analyzing information about the user environment allows to accumulate data about the user environment, to mark precedents in manual and automatic modes and build a database of images for classifiers training. It is necessary to implement data collection, storage and access interface for the application of data mining tools. Operation of significant amount of accumulated data requires the use of special tools (frameworks and hardware platforms) for processing large data. In this paper the analysis of the existing software and hardware tools for distributed processing of indefinitely structured data of bank transactions (frameworks: Hadoop, Apache Spark) is presented. The structure and recommendations for the deployment of a hardware and software stand for testing algorithms for detecting financial fraud on the basis of data mining analysis as part of a distributed data processing system for bank transactions based on the selected framework are developed.
Traffic analysis systems are widely used in monitoring the network activity of users or a specific user and restricting client access to certain types of services (VPN, HTTPS) which makes content analysis impossible. Algorithms for classifying encrypted traffic and detecting VPN traffic are proposed. Three algorithms for constructing classifiers are considered -MLP, RFT and KNN. The proposed classifier demonstrates recognition accuracy on a test sample up to 80%. The MLP, RFT and KNN algorithms had almost identical performance in all experiments. It was also found that the proposed classifiers work better when the network traffic flows are generated using short values of the time parameter (timeout). The novelty lies in the development of network traffic analysis algorithms based on a neural network, differing in the method of selection, generation and selection of features, which allows to classify the existing traffic of protected connections of selected users according to a predetermined set of categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.