Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2095
|View full text |Cite
|
Sign up to set email alerts
|

Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining

Abstract: The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent ev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
84
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 53 publications
(85 citation statements)
references
References 22 publications
1
84
0
Order By: Relevance
“…Both BERT-base and BERT-large outperform the baseline IBM set by Shnarch et al (2018) already by more than 6pp in accuracy 5 . The topic integrating models IBM ELM o and IBM BERT do not improve much over their BiLSTM counterparts, which do not use any topic information.…”
Section: Results and Analysismentioning
confidence: 96%
See 3 more Smart Citations
“…Both BERT-base and BERT-large outperform the baseline IBM set by Shnarch et al (2018) already by more than 6pp in accuracy 5 . The topic integrating models IBM ELM o and IBM BERT do not improve much over their BiLSTM counterparts, which do not use any topic information.…”
Section: Results and Analysismentioning
confidence: 96%
“…A macro F 1 -score is computed for the 3-label classes and scores are averaged over all topics and over ten random seeds. For the IBM Corpus, we use the setup by Shnarch et al (2018): Training on 83 topics (4,066 sentences) and testing on 35 topics (1,719 sentences). We train for five different random seeds and report the average accuracy over all runs.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The process of manual annotation is labor intensive and time consuming, whereas possible use of unsupervised machine learning algorithms could solve the problem of lack of trained annotators. The need for novel algorithms and techniques is emphasized also in previous review papers [12,37] and although there is an evident trend towards unsupervised [132,133] or semi-supervised ML algorithms [7,20,129,134] with notable performance, they could be further improved, as there has not been extensive work neither on the use of suitable features nor on the design of argument schemes. A recent literature review from Silva et al [135] presents the trends in semi-supervised learning for tweet sentiment analysis and it can be used as a point of reference also in the field of AM.…”
Section: Future Directions: Semi-supervision and Background Knowledgementioning
confidence: 99%