Industrial control systems depend heavily on security and monitoring protocols. Several tools are available for this purpose, which scout vulnerabilities and take screenshots of various control panels for later analysis. However, they do not adequately classify images into specific control groups, which is crucial for security-based tasks performed by manual operators. To solve this problem, we propose a pipeline based on deep learning to classify snapshots of industrial control panels into three categories: internet technologies, operation technologies, and others. More specifically, we compare the use of transfer learning and fine-tuning in convolutional neural networks (CNNs) pre-trained on ImageNet to select the best CNN architecture for classifying the screenshots of industrial control systems. We propose the critical infrastructure dataset (CRINF-300), which is the first publicly available information technology (IT)/operational technology (OT) snapshot dataset, with 337 manually labeled images. We used the CRINF-300 to train and evaluate eighteen different pipelines, registering their performance under CPU and GPU environments. We found out that the Inception-ResNet-V2 and VGG16 architectures obtained the best results on transfer learning and fine-tuning, with F1-scores of 0.9832 and 0.9373, respectively. In systems where time is critical and the GPU is available, we recommend using the MobileNet-V1 architecture, with an average time of 0.03 s to process an image and with an F1-score of 0.9758.
Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of illegal activities. In this work, we evaluate eight text recognizers and, to increase the performance of text transcription, we combine these recognizers with rectification networks and super-resolution algorithms. We test our approach on four state-of-the-art and two custom datasets (TOICO-1K and Child Sexual Abuse (CSA)-text, based on text retrieved from Tor Darknet and Child Sexual Exploitation Material, respectively). We obtained a 0.3170 score of correctly recognized words in the TOICO-1K dataset when we combined Deep Convolutional Neural Networks (CNN) and rectification-based recognizers. For the CSA-text dataset, applying resolution enhancements achieved a final score of 0.6960. The highest performance increase was achieved on the ICDAR 2015 dataset, with an improvement of 4.83% when combining the MORAN recognizer and the Residual Dense resolution approach. We conclude that rectification outperforms super-resolution when applied separately, while their combination achieves the best average improvements in the chosen datasets.
Perceptual image hashing methods are often applied in various objectives, such as image retrieval, finding duplicate or near-duplicate images, and finding similar images from large-scale image content. The main challenge in image hashing techniques is robust feature extraction, which generates the same or similar hashes in images that are visually identical. In this article, we present a short review of the state-of-the-art traditional perceptual hashing and deep learning-based perceptual hashing methods, identifying the best approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.