Language classification is the process of identifying the disposition of a presented text, such as classifying an email or a text document into a particular category. Classifying text can involve determining the genre of a book, categorizing a document, or in our case deciding whether an email is spam. The idea behind language classification is to teach the computer to be a filing clerk. Spam filters using a Bayesian combination of the spam probabilities of individual words that employ language classification read and filter your email by learning your personal email behavior (what you think is and isn't spam). There are many spam filters written based on this technology and applied effectively for English and other languages. But they got a low effect when applied directly at Vietnamese spam. Because the token segmentation of the Bayesian filters is not suitable for Vietnamese specific characteristics. We, therefore, propose a Vietnamese segmentation for using token selection for building a Vietnamese spam filter based on language classification and Bayesian combination to sufficiently support Vietnamese. The result is very satisfactory. Thanks to this technique, our filter for Vietnamese spam is 9% more accurate when compared to other filters which use other segmentation technical.
Problem statement: Content-Based Video Retrieval (CBVR) is still an open hard problem because of the semantic gap between low-level features and high-level features, largeness of database, keyframe's content, choosing feature.In this study we introduce a new approach for this problem based on Scale-Invariant Feature Transform (SIFT) feature, a new metric and an object retrieval method. Conclusion/Recommendations: Our algorithm is built on a Content-Based Image Retrieval (CBIR) method in which the keyframe database includes keyframes detected from video database by using our shot detection method. Experiments show that the approach of our algorithmhas fairly high accuracy.
Abstract— This paper is aimed at evaluating the efficiency of Vietnamese SMS spam detection methods on different variants of Vietnamese datasets by utilizing both traditional machine learning models and deep learning models. The researchers experimented with five algorithms, which were Support Vector Machine (SVM), Naive Bayes (NB), Random Forests (RF), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), on three different Vietnamese datasets. The findings reveal that the LSTM and CNN, supported by a transformer learning model - PhoBert, were more efficient than the traditional machine learning models. The LSTM model showed the highest accuracy of 97,77% when operating on the full-accent Vietnamese dataset. Similarly, the CNN model and PhoBert model showed the highest accuracy of 95,56% when dealing with non-diacritic Vietnamese dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.