Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language

Zeyrek, Deniz; Demirsahin, Isin; Çallı, Ayışığı Başak Sevdik

doi:10.5087/dad.2013.208

Cited by 48 publications

(19 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Penn Discourse Treebank (PDTB V3, Prasad et al 2019) is the largest discourse annotated corpus of English, and the largest resource annotated explicitly for discourse relation signals such as connectives, with similar corpora having been developed for a variety of languages (e.g. Zeyrek et al 2013for Turkish, Zhou et al 2014. However the annotation scheme used by PDTB is ahierarchical, annotating only pairs of textual argument spans connected by a discourse relation, and disregarding relations at higher levels, such as relations between paragraphs or other groups of discourse units.…”

Section: Discourse Relation Signal Annotationsmentioning

confidence: 99%

A Neural Approach to Discourse Relation Signal Detection

Zeldes

Liu

2020

dad

View full text Add to dashboard Cite

Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, Δs (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.

show abstract

Section: Discourse Relation Signal Annotationsmentioning

confidence: 99%

A Neural Approach to Discourse Relation Signal Detection

Zeldes

Liu

2020

dad

View full text Add to dashboard Cite

show abstract

“…In building the TCL, we use three PDTBinspired annotated corpora to compile explicit DCs, namely, Turkish Discourse Bank or TDB 1.0 (Zeyrek et al, 2013), TDB 1.1 (Zeyrek and Kurfalı, 2017), and the Turkish section of TED-MDB.…”

Section: Data Sourcesmentioning

confidence: 99%

TCL - a Lexicon of Turkish Discourse Connectives

Zeyrek¹,

Başıbüyük²

2019

Proceedings of the First International Workshop on Designing Meaning Representations

Self Cite

View full text Add to dashboard Cite

It is known that discourse connectives are the most salient indicators of discourse relations. State-of-the-art parsers being developed to predict explicit discourse connectives exploit annotated discourse corpora but a lexicon of discourse connectives is also needed to enable further research in discourse structure and support the development of language technologies that use these structures for text understanding. This paper presents a lexicon of Turkish discourse connectives built by automatic means. The lexicon has the format of the German connective lexicon, DiMLex, where for each discourse connective, information about the connective's orthographic variants, syntactic category and senses are provided along with sample relations. In this paper, we describe the data sources we used and the development steps of the lexicon.

show abstract

“…There are several discourse-annotated corpora in different theoretical frameworks. The PDTB [18] style of annotation has been applied to other languages besides English, such as Turkish [33], Chinese [35], Czech [26], and applied to English and French speech data [6]. For Brazilian Portuguese, several corpora have been annotated in the RST and CST frameworks (CSTNews, CorpusTCC, Rhetalho, Summ-it) [1,14].…”

Section: Related Workmentioning

confidence: 99%

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Mendes

Gayo

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.

show abstract

Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language

Cited by 48 publications

References 13 publications

A Neural Approach to Discourse Relation Signal Detection

A Neural Approach to Discourse Relation Signal Detection

TCL - a Lexicon of Turkish Discourse Connectives

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Contact Info

Product

Resources

About