Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019 2019
DOI: 10.18653/v1/w19-2714
|View full text |Cite
|
Sign up to set email alerts
|

Multi-lingual and Cross-genre Discourse Unit Segmentation

Abstract: We describe a series of experiments applied to data sets from different languages and genres annotated for coherence relations according to different theoretical frameworks. Specifically, we investigate the feasibility of a unified (theory-neutral) approach toward discourse segmentation; a process which divides a text into minimal discourse units that are involved in some coherence relation. We apply a RandomForest and an LSTM based approach for all data sets, and we improve over a simple baseline assuming sim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…All of the systems from DISRPT 2019 use lexical features, where the best systems (Muller et al, 2019; are recurrent neural networks. The system that is most similar to the current work is the (best) system from Bourgonje and Schäfer (2019), who use a random forest classifier and extract features at the token-level, e.g. surface form, POS tag, position in the sentence, succeeding punctuation mark.…”
Section: Resultsmentioning
confidence: 99%
“…All of the systems from DISRPT 2019 use lexical features, where the best systems (Muller et al, 2019; are recurrent neural networks. The system that is most similar to the current work is the (best) system from Bourgonje and Schäfer (2019), who use a random forest classifier and extract features at the token-level, e.g. surface form, POS tag, position in the sentence, succeeding punctuation mark.…”
Section: Resultsmentioning
confidence: 99%
“…Several teams participated in the DISRPT 2019. Among the best proposals to mention some: Tony [47] employ single-layer bi-directional LSTM models with different pre-trained embeddings, and they get the best results using contextual embeddings.DFKI RF [48] uses a Random Forest (based on Scikit-learn [49] whose input is a combination of dependency tree and constituency syntax information. In addition, they use a LSTM-based method (based on Keras [50]) with pre-trained word embeddings [51].…”
Section: Related Workmentioning
confidence: 99%
“…DFKI RF [48] uses a Random Forest (based on Scikit-learn [49] whose input is a combination of dependency tree and constituency syntax information. In addition, they use a LSTM-based method (based on Keras [50]) with pre-trained word embeddings [51].…”
Section: Related Workmentioning
confidence: 99%
“…The main results for connective detection are given in Table 4. Three systems approached this task, though the DFKI system was not adapted substantially from the segmentation scenario, leading to low performance (Bourgonje and Schäfer, 2019), and did not report results on automatically parsed data. ToNy again has the highest scores for the most datasets, obtaining the highest mean F-score for the plain tokenized scenario, and coming second to GumDrop only on the Turkish dataset in the gold syntax scenario.…”
Section: Connective Detectionmentioning
confidence: 99%