2013
DOI: 10.1007/s10579-013-9216-5
|View full text |Cite
|
Sign up to set email alerts
|

Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
1

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 26 publications
(14 citation statements)
references
References 4 publications
0
13
0
1
Order By: Relevance
“…Besides Schmidt, Hedeland and Jettka (2017) and the ISO specification itself (ISO 2016), the role of TEI as a suitable basis of a standard for spoken language transcription has been discussed, among others, by Schmidt (2011) and Liégeois et al (2017). The TEI guidelines' chapter 8 on "Transcriptions of Speech" (TEI Consortium 2019) has also been used in CLARIN resources such as the GOS Corpus of Spoken Slovene (see Verdonik et al 2013) and as the basis for a CLARIN-wide format for parliamentary data. 3…”
Section: Related Workmentioning
confidence: 99%
“…Besides Schmidt, Hedeland and Jettka (2017) and the ISO specification itself (ISO 2016), the role of TEI as a suitable basis of a standard for spoken language transcription has been discussed, among others, by Schmidt (2011) and Liégeois et al (2017). The TEI guidelines' chapter 8 on "Transcriptions of Speech" (TEI Consortium 2019) has also been used in CLARIN resources such as the GOS Corpus of Spoken Slovene (see Verdonik et al 2013) and as the basis for a CLARIN-wide format for parliamentary data. 3…”
Section: Related Workmentioning
confidence: 99%
“…The SST treebank currently amounts to 29,488 tokens (3,188 utterances), which include both lexical tokens (words) and tokens signalling other types of verbal phenomena, such as filled pauses (fillers) and unfinished words, as well as some basic markers of prosody and extralinguistic speech events. The original segmentation, tokenization and spelling principles described by Verdonik et al (2013) have also been inherited by SST. Among the two types of Gos transcriptions (pronunciation-based and normalized spelling, both in lowercase only), subsequent manual annotations in SST have been performed on top of normalized transcriptions.…”
Section: Spoken Slovenian Treebankmentioning
confidence: 99%
“…Segmentation: Inheriting the manual segmentation of the reference Gos corpus, sentences (utterances) in SST correspond to "semantically, syntactically and acoustically delimited units" (Verdonik et al, 2013). As such, the utterance segmentation heavily depends on subjective interpretations of what is the basic functional unit in speech, in line with the multitude of existing segmentation approaches, based on syntax, semantics, prosody, or their various combinations (Degand and Simon, 2009).…”
Section: Modifications Of Speech Transcriptionmentioning
confidence: 99%
“…Typically, spoken language annotation denotes annotation of its representation in the form of written transcription. In the Spoken Slovenian Treebank, the spelling, tokenization and segmentation principles follow the transcription guidelines of the reference Gos corpus (Verdonik et al, 2013). The syntactic trees in the treebank span over individual utterances, manually delimited in the process of reference corpus transcription.…”
Section: Segmentation Tokenization and Spellingmentioning
confidence: 99%