The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language

Trakultaweekoon, Kanokorn; Thaiprayoon, Santipong; Palingoon, Pornpimon; Rugchatjaroen, Anocha

doi:10.1109/isai-nlp48611.2019.9045143

Cited by 11 publications

(9 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Figure 1, the proposed framework works as follows. Firstly, we aggregate, clean and normalize our datasets: TyDiQA , XQuAD (Artetxe et al, 2019), Iapp Wiki QA (Viriyayudhakorn and Polpanumas, 2021), and Thai QA (Trakultaweekoon et al, 2019). Then, we translate all questions into English and backtranslate to Thai using Google Translate.…”

Section: Words In Different Frequency Groupmentioning

confidence: 99%

Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

2023

View full text Add to dashboard Cite

One of the frequent points in the mainstream narrative about large language models is that they have "emergent properties" (sometimes even dangerous enough to be considered existential risk to mankind). However, there is much disagreement about even the very definition of such properties. If they are understood as a kind of generalization beyond training data -as something that a model does without being explicitly trained for it -I argue that we have not in fact established the existence of any such properties, and at the moment we do not even have the methodology for doing so.

show abstract

Section: Words In Different Frequency Groupmentioning

confidence: 99%

Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

2023

View full text Add to dashboard Cite

show abstract

“…There are two Thai QA corpora used in our e Wiki QA. The dataset statistics of both datasets are Thai Wiki QA [8] is a SQuAD-like dataset in th competition dataset in Thailand National Software C dataset consists of 15,000 question-answer pairs wi annotated by 15 native Thai speakers with many ki The publisher of Thai Wiki QA also published 125,3 this dataset as an open domain QA task. In this stud for generating more question answering samples.…”

Section: Datasetsmentioning

confidence: 99%

“…The dataset statistics of both datasets are shown in Table 3. Thai Wiki QA [8] is a SQuAD-like dataset in the Thai language. It was used as a QA competition dataset in Thailand National Software Contest (NSC), during 2018-2019.…”

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Phakmongkol

Vateekul

2021

Applied Sciences

View full text Add to dashboard Cite

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.

show abstract

“…This model is implemented using a convolutional neural network, a bidirectional long short-term memory network, and question pair matching to perform QA processing. In still another study, Kanokorn et al [13] proposed an information extraction process for both questions and answers that uses the Thai language and a related corpus. This research resulted in a web-based QA system whose answers are factoids extracted from Thai Wikipedia articles.…”

Section: Related Workmentioning

confidence: 99%

Integrated Question-Answering System for Natural Disaster Domains Based on Social Media Messages Posted at the Time of Disaster

Kemavuthanon

Uchida

2020

Information

View full text Add to dashboard Cite

Natural disasters are events that humans cannot control, and Japan has suffered from many such disasters over its long history. Many of these have caused severe damage to human lives and property. These days, numerous Japanese people have gained considerable experience preparing for disasters and are now striving to predict the effects of disasters using social network services (SNSs) to exchange information in real time. Currently, Twitter is the most popular and powerful SNS tool used for disaster response in Japan because it allows users to exchange and disseminate information quickly. However, since almost all of the Japanese-related content is also written in the Japanese language, which restricts most of its benefits to Japanese people, we feel that it is necessary to create a disaster response system that would help people who do not understand Japanese. Accordingly, this paper presents the framework of a question-answering (QA) system that was developed using a Twitter dataset containing more than nine million tweets compiled during the Osaka North Earthquake that occurred on 18 June 2018. We also studied the structure of the questions posed and developed methods for classifying them into particular categories in order to find answers from the dataset using an ontology, word similarity, keyword frequency, and natural language processing. The experimental results presented herein confirm the accuracy of the answer results generated from our proposed system.

show abstract

The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language

Cited by 11 publications

References 4 publications

Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Integrated Question-Answering System for Natural Disaster Domains Based on Social Media Messages Posted at the Time of Disaster

Contact Info

Product

Resources

About