This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and sentiment analysis were performed with the decision tree (DT), naïve Bayes (NB), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) techniques. The experimental results showed that TF-G is appropriate for feature extraction to classify the risk of suicide and to analyze the sentiments of customer reviews and tweets of travelers. The TF-G technique was more accurate than BoW, TF, TF-IDF and TF-ICF for term weighting in Thai suicide risk classification, for term weighting in sentiment analysis of Thai customer reviews for Burger King, Pizza Hut, and Sizzler restaurants, and for the sentiment analysis of English tweets of travelers using US airline services.
Objectives: The purpose of this study was to identify patterns of self-harm risk factors from suicide and self-harm surveillance reports in Thailand.Methods: This study analyzed data from suicide and self-harm surveillance reports submitted to Khon Kaen Rajanagarindra Psychiatric Hospital, Thailand. The process of identifying patterns of self-harm risk factors involved: data preprocessing (namely, data preparation and cleaning, missing data management using listwise deletion and expectation-maximization techniques, subgrouping factors, determining the target factors, and data correlation for learning); classifying the risk of self-harm (severe or mild) using 10-fold cross-validation with the support vector machine, random forest, multilayer perceptron, decision tree, k-nearest neighbors, and ensemble techniques; data filtering; identifying patterns of self-harm risk factors using 10-fold cross-validation with the classification and regression trees (CART) technique; and evaluating patterns of self-harm risk factors.Results: The random forest technique was most accurate for classifying the risk of self-harm, with specificity, sensitivity, and F-score of 92.84%, 93.12%, and 91.46%, respectively. The CART technique was able to identify 53 patterns of self-harm risk, consisting of 16 severe self-harm risk patterns and 37 mild self-harm risk patterns, with an accuracy of 92.85%. In addition, we discovered that the type of hospital was a new risk factor for severe selfharm.Conclusions: The procedure presented herein could identify patterns of risk factors from self-harm and assist psychiatrists in making decisions related to self-harm among patients visiting hospitals in Thailand.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.