File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.
Nowadays, many services in the internet including Email, search engine, social networking are provided with free of charge due to enormous growth of web users. With the expansion of Web services, denial of service (DoS) attacks by malicious automated programs (e.g., web bots) is becoming a serious problem of web service accounts. A HIP, or Human Interactive Proofs, is a human authentication mechanism that generates and grades tests to determine whether the user is a human or a malicious computer program. Unfortunately, the existing HIPs tried to maximize the difficulty for automated programs to pass tests by increasing distortion or noise. Consequently, it has also become difficult for potential users too. So there is a tradeoff between the usability and robustness in designing HIP tests. In their propose technique the authors tried to balance the readability and security by adding contextual information in the form of natural conversation without reducing the distortion and noise. In the result section, a microscopic large-scale user study was conducted involving 110 users to investigate the actual user views compare to existing state of the art CAPTCHA systems like Google's reCAPTCHA and Microsoft's CAPTCHA in terms of usability and security and found the authors' system capable of deploying largely over internet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.