Background: In line with technological advances, virtual crimes have a greater tendency to occur. The most routine situations are: cyberbullying and illicit sharing, but cases of invasion of privacy, dissemination of defamatory emails and child pornography also occur. When digital equipment is stolen, lost or disposed of, data is still stored on disks. This factor enables the recovery of this file.
Objective: The main focus of this article is the recovery of formatted files. The applicability of Foremost, Scalpel and Magic Rescue tools in the Linux environment is investigated. In addition, an authorial tool, equipped with machine learning, is used. The general objective of this research is to develop a recovery and validation tool for formatted files. The project aims to contribute to investigations of digital and cyber crimes. In addition to demonstrating knowledge, it brings new perceptions of analysis with regard to recovery methods for formatted files.
Methods: Using the pattern recognition methodology, the cluster is used as input, which, through the histogram, acts as an input neuron of the learning machine. This work applies machine learning aiming at recognizing the pattern of the blocks/clusters. In the first scenario, here named “simple”, the classification is binary. There is only class vs. counterclass. This methodology was developed by Pavel (2017) and replicated in the aforementioned simple scenario. In a second scenario, named “complex”, the one-against-all method was used, whose database contains 16,000 files.
Results: This research presents a cutting-edge approach that synergizes machine learning and data science to recover formatted data. Our innovative tool boasts a remarkable recovery rate of over 96% for formatted PNG and JPEG files, running in just a few seconds. This breakthrough holds significant promise for improving digital forensic investigations.
Conclusions: The experiences declared in the proposed work leave contributions for the advancement in studies about data recovery. Its relevance concerns the proposal to assist in the elucidation of digital crimes that surround the society of digital natives. The authorial system becomes a locus of how technology can serve human rights and improve the quality of life in contemporary society.