In Thai, tonal information is a crucial component for identifying the lexical meaning of a word. Consequently, Thai tone classification can obviously improve performance of Thai speech recognition system. In this article, we therefore reported our study of Thai tone classification. Based on our investigation, most of Thai tone classification studies relied on statistical machine learning approaches, especially the Artificial Neural Network (ANN)-based approach and the Hidden Markov Model (HMM)-based approach. Although both approaches gave reasonable performances, they had some limitations due to their mathematical models. We therefore introduced a novel approach for Thai tone classification using a Hidden Conditional Random Field (HCRF)based approach. In our study, we also investigated tone configurations involving tone features, frequency scaling and normalization techniques in order to fine-tune performances of Thai tone classification. Experiments were conducted in both isolated word scenario and continuous speech scenario. Results showed that the HCRF-based approach with the feature F_dF_aF, ERB-rate scaling and a z-score normalization technique yielded the highest performance and outperformed a baseline using the ANNbased approach, which had been reported as the best for the Thai tone classification, in both scenarios. The best performance of HCRF-based approach provided the error rate reduction of 10.58% and 12.02% for isolated word scenario and continuous speech scenario respectively when comparing with the best result of baselines.
SUMMARYKnowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.