Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington The digital era has led to an unprecedented increase in the amount of information available, of which an essential part is represented by visual data. The data forensics community asks for machine solutions to face the proliferation of image data. This thesis addresses the specific problem of distinguishing two-dimensional map images from other image content by examining two computational methods: Convolutional Neural Networks (CNNs) and Bag of Words (BOW). No information about current automated solutions for the mentioned task is available. The CNN used in this research consists of 60 million parameters and 650,000 neurons in eight weighted layers, is pre-trained on 1,000 classes, and provides an immense learning capacity. The BOW method uses a visual vocabulary, constructed by clustering higher-level image information, to classify unknown images by comparing their contained visual words with a content-specific vocabulary of a classifier.
ABSTRACTThe digital era has led to an unprecedented increase in the amount of information available, of which an essential part is represented by visual data. The data forensics community asks for machine solutions to face the proliferation of image data. This thesis addresses the specific problem of distinguishing two-dimensional map images from other image content by examining two computational methods: Convolutional Neural Networks (CNNs) and Bag of Words (BOW). No information about current automated solutions for the mentioned task is available. The CNN used in this research consists of 60 million parameters and 650,000 neurons in eight weighted layers, is pre-trained on 1,000 classes, and provides an immense learning capacity. The BOW method uses a visual vocabulary, constructed by clustering higher-level image information, to classify unknown images by comparing their contained visual words with a content-specific vocabulary of a classifier. Both methods are evaluated in terms of recall and precision, or percentage of correctly and incorrectly classified images. The data collection consists of 1,200 map images called positives, subdivided into four sub-classes, and an additional 1,200 images without map content, called negatives.Results with a recall up to 99.17% and corresponding precision up to 97.01% support the idea of implementing CNN and BOW as the backbone of a computer-based classification application.