2021
DOI: 10.1007/s10579-021-09536-6
|View full text |Cite
|
Sign up to set email alerts
|

A large English–Thai parallel corpus from the web and machine-generated text

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(32 citation statements)
references
References 18 publications
0
32
0
Order By: Relevance
“…Three common machine learning algorithms. To satisfy the conditions of the independence assumption of the plain Bayesian classifier algorithm, we can ignore the dependencies between Thai characters in the Thai subsets and assume that the existence of each Thai character is independent the others 29,30 . We can construct the plain Bayesian classifier model as shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
“…Three common machine learning algorithms. To satisfy the conditions of the independence assumption of the plain Bayesian classifier algorithm, we can ignore the dependencies between Thai characters in the Thai subsets and assume that the existence of each Thai character is independent the others 29,30 . We can construct the plain Bayesian classifier model as shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
“…Count and display the position of a single word or lexical chunk in the article or corpus, and show it with a black bar chart on a white background, which is equivalent to hot spot analysis. It intuitively displays the specific position of the text search word, so as to facilitate the researchers to identify the distribution of words or lexical chunks in the article or corpus [26].…”
Section: General Function Modulementioning
confidence: 99%
“…The global COVID-19 fake news also has a chance to be translated and published in Thai social. The open COVID-19 fake news datasets are also translated to Thai as source dataset using SCB-MT-EN-TH translation by VISTEC.AI [78][79][80] as an external knowledge. The source dataset is used to pre-train those transfer learning models (BERT [72], ULMFiT [73] and GPT [74]).…”
Section: English To Thai Translatingmentioning
confidence: 99%
“…The pre-training Thai COVID-19 models was trained by the 123,762 Thai-translated single texts from source dataset (as described in section 3.2). For the data collection, the global COVID-19 fake (and real) news in English from well-known open datasets: CoAID [25,26], ReCOVery [27,28] and FakeCovid [29,30] were collected and translated to Thai using SCB-MT-EN-TH translation by VISTEC.AI [77][78][79]. To evaluate the quality of English to Thai translation, the BLEU [83] was used and compared to other wellknown Eng-to-Thai machines: AI for Thai by NECTEC [47] and Google Translate.…”
Section: Source Dataset By Machine Translationmentioning
confidence: 99%