Spam emails are widely spreading to constitute a significant share of everyone's daily inbox. Being a source of financial loss and inconvenience for the recipients, spam emails have to be filtered and separated from legitimate ones. This paper presents a survey of some popular filtering algorithms that rely on text classification to decide whether an email is unsolicited or not. A comparison among them is performed on the SpamBase dataset to identify the best classification algorithm in terms of accuracy, computational time, and precision/recall rates.I.
A principled approach to machine learning (ML) problems because of its mathematical foundations in statistical learning theory, support vector machines (SVM), a non-parametric method, require all the data to be available during the training phase.. However, once the model parameters are identified, SVM relies only, for future prediction, on a subset of these training instances, called support vectors (SV). The SVM model is mathematically written as a weighted sum of these SV whose number, rather than the dimensionality of the input space, defines SVM's complexity. Since the final number of these SV can be up to half the size of the training dataset, SVM becomes challenged to run on energy aware computing platforms. We propose in this work Knee-Cut SVM (KCSVM) and Knee-Cut Ordinal Optimization inspired SVM (KCOOSVM) that use a soft trick of ordered kernel values and uniform subsampling to reduce SVM's prediction computational complexity while maintaining an acceptable impact on its generalization capability. When tested on several databases from UCI, KCSVM and KCOOSVM produced promising results, comparable to similar published algorithms.
Using local invariant features has been proven by published literature to be powerful for image processing and pattern recognition tasks. However, in energy aware environments, these invariant features would not scale easily because of their computational requirements. Motivated to find an efficient building recognition algorithm based on scale invariant feature transform (SIFT) keypoints, we present in this paper uSee, a supervised learning framework which exploits the symmetrical and repetitive structural patterns in buildings to identify subsets of relevant clusters formed by these keypoints. Once an image is captured by a smart phone, uSee preprocesses it using variations in gradient angle- and entropy-based measures before extracting the building signature and comparing its representative SIFT keypoints against a repository of building images. Experimental results on 2 different databases confirm the effectiveness of uSee in delivering, at a greatly reduced computational cost, the high matching scores for building recognition that local descriptors can achieve. With only 14.3% of image SIFT keypoints, uSee exceeded prior literature results by achieving an accuracy of 99.1% on the Zurich Building Database with no manual rotation; thus saving significantly on the computational requirements of the task at hand.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.