File fragment classification is an important step in digital forensics. The most popular method is based on traditional machine learning by extracting features like Ngram, Shannon entropy or Hamming weights. However, these features are far from enough to classify file fragments. In this paper, we propose a novel scheme based on fragment-tograyscale image conversion and deep learning to extract more hidden features and therefore improve the accuracy of classification. Benefit from the multi-layered feature maps, our deep convolution neural network (CNN) model can extract nearly ten thousands of features through the non-linear connections between neurons. Our proposed CNN model was trained and tested on the public dataset GovDocs. The experiments results show that we can achieve 70.9% accuracy in classification, which is higher than those of existing works.
Background
As a mosquito-borne infectious disease, dengue fever (DF) has spread through tropical and subtropical regions worldwide in recent decades. Dengue forecasting is essential for enhancing the effectiveness of preventive measures. Current studies have been primarily conducted at national, sub-national, and city levels, while an intra-urban dengue forecasting at a fine spatial resolution still remains a challenging feat. As viruses spread rapidly because of a highly dynamic population flow, integrating spatial interactions of human movements between regions would be potentially beneficial for intra-urban dengue forecasting.
Methodology
In this study, a new framework for enhancing intra-urban dengue forecasting was developed by integrating the spatial interactions between urban regions. First, a graph-embedding technique called Node2Vec was employed to learn the embeddings (in the form of an N-dimensional real-valued vector) of the regions from their population flow network. As strongly interacting regions would have more similar embeddings, the embeddings can serve as “interaction features.” Then, the interaction features were combined with those commonly used features (e.g., temperature, rainfall, and population) to enhance the supervised learning–based dengue forecasting models at a fine-grained intra-urban scale.
Results
The performance of forecasting models (i.e., SVM, LASSO, and ANN) integrated with and without interaction features was tested and compared on township-level dengue forecasting in Guangzhou, the most threatened sub-tropical city in China. Results showed that models using both common and interaction features can achieve better performance than that using common features alone.
Conclusions
The proposed approach for incorporating spatial interactions of human movements using graph-embedding technique is effective, which can help enhance fine-grained intra-urban dengue forecasting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.