Document classification is a deep-rooted issue in information retrieval and assumed to be an imperative part of an assortment of applications for effective management of text documents and substantial volumes of unstructured data. Automatic document classification can be defined as a contentbased arrangement of documents to some predefined categories which is for sure, less demanding for fetching the relevant data at the right time as well as filtering and steering documents directly to users. For recovering data effortlessly at the minimum time, scientists around the globe are trying to make content-based classifiers and as a consequence, an assortment of classification frameworks has been developed. Unfortunately, because of using conventional algorithms, almost all of these frameworks fail to classify documents into the proper categories. However, this paper proposes the Soft Cosine Measure as a document classification method for classifying text documents based on its contents. This classification method considers the similarity of the features of the texts rather than making their physical compatibility. For example, the traditional systems consider 'emperor' and 'king' as two different words where the proposed method extracts the same meaning for both of these words. For feature extraction capability and content-based similarity measure technique, the proposed system scores the classification accuracy up to 98.60%, better than any other existing systems.
Nowadays addiction to drugs and alcohol has become a significant threat to the youth of the society as Bangladesh’s population. So, being a conscientious member of society, we must go ahead to prevent these young minds from life-threatening addiction. In this paper, we approach a machinelearning-based way to forecast the risk of becoming addicted to drugs using machine-learning algorithms. First, we find some significant factors for addiction by talking to doctors, drug-addicted people, and read relevant articles and write-ups. Then we collect data from both addicted and nonaddicted people. After preprocessing the data set, we apply nine conspicuous machine learning algorithms, namely k-nearest neighbors, logistic regression, SVM, naïve bayes, classification, and regression trees, random forest, multilayer perception, adaptive boosting, and gradient boosting machine on our processed data set and measure the performances of each of these classifiers in terms of some prominent performance metrics. Logistic regression is found outperforming all other classifiers in terms of all metrics used by attaining an accuracy approaching 97.91%. On the contrary, CART shows poor results of an accuracy approaching 59.37% after applying principal component analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.