Anna Kurtukova scite author profile

The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.

show abstract

Source Code Authorship Identification Using Deep Neural Networks

Kurtukova

Romanov

Shelupanov

2020

Symmetry

View full text Add to dashboard Cite

Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else’s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author’s technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.

show abstract

Reconstruction of a 3D Human Foot Shape Model Based on a Video Stream Using Photogrammetry and Deep Neural Networks

et al. 2021

View full text Add to dashboard Cite

Reconstructed 3D foot models can be used for 3D printing and further manufacturing of individual orthopedic shoes, as well as in medical research and for online shoe shopping. This study presents a technique based on the approach and algorithms of photogrammetry. The presented technique was used to reconstruct a 3D model of the foot shape, including the lower arch, using smartphone images. The technique is based on modern computer vision and artificial intelligence algorithms designed for image processing, obtaining sparse and dense point clouds, depth maps, and a final 3D model. For the segmentation of foot images, the Mask R-CNN neural network was used, which was trained on foot data from a set of 40 people. The obtained accuracy was 97.88%. The result of the study was a high-quality reconstructed 3D model. The standard deviation of linear indicators in length and width was 0.95 mm, with an average creation time of 1 min 35 s recorded. Integration of this technique into the business models of orthopedic enterprises, Internet stores, and medical organizations will allow basic manufacturing and shoe-fitting services to be carried out and will help medical research to be performed via the Internet.

show abstract

De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms

Kurtukova

Romanov

Fedotova

2019

View full text Add to dashboard Cite

Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection

et al. 2021

View full text Add to dashboard Cite

Authorship attribution is one of the important fields of natural language processing (NLP). Its popularity is due to the relevance of implementing solutions for information security, as well as copyright protection, various linguistic studies, in particular, researches of social networks. The article is a continuation of the series of studies aimed at the identification of the Russian-language text’s author and reducing the required text volume. The focus of the study was aimed at the attribution of textual data created as a product of human online activity. The effectiveness of the models was evaluated on the two Russian-language datasets: literary texts and short comments from users of social networks. Classical machine learning (ML) algorithms, popular neural networks (NN) architectures, and their hybrids, including convolutional neural network (CNN), networks with long short-term memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), and fastText, that have not been used in previous studies, were applied to solve the problem. A particular experiment was devoted to the selection of informative features using genetic algorithms (GA) and evaluation of the classifier trained on the optimal feature space. Using fastText or a combination of support vector machine (SVM) with GA reduced the time costs by half in comparison with deep NNs with comparable accuracy. The average accuracy for literary texts was 80.4% using SVM combined with GA, 82.3% using deep NNs, and 82.1% using fastText. For social media comments, results were 66.3%, 73.2%, and 68.1%, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anna Kurtukova

Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks

Source Code Authorship Identification Using Deep Neural Networks

Reconstruction of a 3D Human Foot Shape Model Based on a Video Stream Using Photogrammetry and Deep Neural Networks

De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms

Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection

Contact Info

Product

Resources

About