2015
DOI: 10.13053/rcs-90-1-9
|View full text |Cite
|
Sign up to set email alerts
|

Spoken Tunisian Arabic Corpus “STAC”: Transcription and Annotation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…It holds 1600 sentences containing 8627 words. The second corpus is the Zribi's corpus 35 . It is collected from TV channels and radio stations.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It holds 1600 sentences containing 8627 words. The second corpus is the Zribi's corpus 35 . It is collected from TV channels and radio stations.…”
Section: Discussionmentioning
confidence: 99%
“…It recognizes different phrases and compounds. The unrecognized words in Zribi's corpus 35 are almost related to the use of a lot of MSA words. Our Tunisian morphological analyzer therefore cannot treat these words as the word “ >afoDalo ” (the best).…”
Section: Discussionmentioning
confidence: 99%
“…Levantine Arabic has also received a lot of attention, as in the creation of the Levantine Arabic Treebank (LATB) ( Maamouri et al, 2006 ), including 27,000 words in Jordanian Arabic. Some efforts were made for Tunisian ( Masmoudi et al, 2014 ; Zribi et al, 2015 ), and Algerian ( Smaıli et al, 2014 ). For Gulf Arabic, the Gumar corpus ( Khalifa et al, 2016a ) consists of 1,200 documents written in Gulf Arabic dialects from different forum novels available online ( https://nyuad.nyu.edu/en/research/centers-labs-and-projects/computational-approaches-to-modeling-language-lab/resources.html ).…”
Section: Related Researchmentioning
confidence: 99%
“…Several works have been carried out to create parallel and annotated corpora, notably by Graja et al (2013) and Zribi et al (2015). The work of Graja et al (2013) focused on the semantic annotation of spoken TD, using a discriminative model based on conditional random fields (CRFs).…”
Section: Related Workmentioning
confidence: 99%
“…The work of Graja et al (2013) focused on the semantic annotation of spoken TD, using a discriminative model based on conditional random fields (CRFs). As for Zribi et al (2015) they developed two corpora, namely a written corpus segmented in sentences, whose words are segmented and annotated by lemmas, gender, number, person, voice, labels, etc., as well as an oral corpus annotated by the different types of disfluencies. Mdhaffar et al (2017) collected a corpus, called Tunisian sentiment analysis corpus (TSAC) from Facebook, which they manually annotated with positive and negative polarities.…”
Section: Related Workmentioning
confidence: 99%