Despite the fact that there are a number of researches working on Khmer Language in the field of Natural Language Processing along with some resources regarding words segmentation and POS Tagging, we still lack of high-level resources regarding syntax, Treebanks and grammars, for example. This paper illustrates the semi-automatic framework of constructing Khmer Treebank and the extraction of the Khmer grammar rules from a set of sentences taken from the Khmer grammar books. Initially, these sentences will be manually annotated and processed to generate a number of grammar rules with their probabilities once the Treebank is obtained. In our experiments, the annotated trees and the extracted grammar rules are analyzed in both quantitative and qualitative way. Finally, the results will be evaluated in three evaluation processes including Self-Consistency, 5-Fold Cross-Validation, Leave-One-Out Cross-Validation along with the three validation methods such as Precision, Recall, F1-Measure. According to the result of the three validations, Self-Consistency has shown the best result with more than 92%, followed by the Leave-One-Out Cross-Validation and 5-Fold Cross Validation with the average of 88% and 75% respectively. On the other hand, the crossing bracket data shows that Leave-One-Out Cross Validation holds the highest average with 96% while the other two are 85% and 89%, respectively.
Social media has become one of the major data sources for social studies through users’ expressions, such as significant moments in their daily life or their feelings and perceptions toward specific discussion topics. In health care, social media is thoroughly used to study people’s discourse on ailments and derive insights into the impact of ailments on patients’ quality of life. Recently, there has been an increasing interest in applying machine learning algorithms to enhance the prediction of ailments through users’ social media data. In this study, nearly 800 million posts were retrieved from Twitter through preprocessing and running the time-aware ailment topic aspect model (T-ATAM) to predict diseases, symptoms, and remedies for two chronic conditions, namely sleep apnea and chronic liver diseases. The study was conducted on English tweets emitted during 2018, most of which were from European countries and the United States. The data were processed using T-ATAM by regions, timestamps, and treatment, namely continuous positive airway pressure (CPAP), to see the differences in the distributions of top diseases along with the top symptoms and remedies in different regions; timestamps; as well as before, during, and after CPAP was introduced. Based on approximately 331,000 tweets related to liver diseases and 1 million tweets on sleep apnea, various visualizations of statistics are displayed, including world maps, word clouds, and histograms. Results of this study indicate that depression and drinking are the leading symptoms of liver diseases; meanwhile, lack of nighttime sleep and overworking are considered the main factors of sleep apnea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.