Proceedings of the Second Workshop on Universal Dependencies (UDW 2018) 2018
DOI: 10.18653/v1/w18-6005
|View full text |Cite
|
Sign up to set email alerts
|

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

Abstract: Despite the significant improvement of datadriven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data. On the example of Spoken Slovenian Treebank, the first spoken data treebank using the UD annotation scheme, we investigate which speechspecific phenomena undermine parsing performance, through a series of training data and treebank modification experiments using two distinct state-of-the-art parsing syste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…We are not the first to annotate spoken data. Previous work has annotated English for conversation agents (Davidson et al, 2019), Slovenian data (Dobrovoljc and Martinc, 2018), Komi-Zyrian (Partanen et al, 2018) and Turkish-German (C ¸etinoglu and C ¸öltekin, 2019).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We are not the first to annotate spoken data. Previous work has annotated English for conversation agents (Davidson et al, 2019), Slovenian data (Dobrovoljc and Martinc, 2018), Komi-Zyrian (Partanen et al, 2018) and Turkish-German (C ¸etinoglu and C ¸öltekin, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Commonly mentioned problems are disfluencies and sentence segmentation (Dobrovoljc and Martinc, 2018). Two main types of solutions can be identified; adapting the existing guidelines (C ¸etinoglu and C ¸öltekin, 2019) versus extending them (Davidson et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Compared to previous studies, our work goes beyond in several respects (see Section 2). First, compared to most of the other spo-ken dependency treebanks that contain telephone conversations (Bechet et al, 2014), interactions between adults (Dobrovoljc and Martinc, 2018;Dobrovoljc and Nivre, 2016), or user-generated content (Davidson et al, 2019), our dataset attends to child and child-directed speech.…”
Section: Introductionmentioning
confidence: 99%
“…Specifications for tagging relative pronouns and interrogative particles are insufficiently documented in the MULTEXT-East specifications for Serbo-Croatian, which might have resulted in them being erroneously tagged not only in this, but also in other Serbian and Croatian corpora as well (see srWaC and hrWaC).20 SeeDobrovoljc and Martinc (2018) on the impact of discourse markers on spoken language dependency parsing for Slovene.…”
mentioning
confidence: 99%