Proceedings of the 5th Workshop on Noisy User-Generated Text (W-Nut 2019) 2019
DOI: 10.18653/v1/d19-5556
|View full text |Cite
|
Sign up to set email alerts
|

Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

Abstract: Understanding the vulnerability of linguistic features extracted from noisy text is important for both developing better health text classification models and for interpreting vulnerabilities of natural language models. In this paper, we investigate how generic language characteristics, such as syntax or the lexicon, are impacted by artificial text alterations. The vulnerability of features is analysed from two perspectives: (1) the level of feature value change, and (2) the level of change of feature predicti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 46 publications
0
4
0
Order By: Relevance
“…As in [20], we computed lexical and syntactic complexity using [1] for L2 texts. A correlation between text level and lexical complexity is observed, as well as a relation between some texts and their syntactic complexity.…”
Section: Experimental Protocol and First Resultsmentioning
confidence: 99%
“…As in [20], we computed lexical and syntactic complexity using [1] for L2 texts. A correlation between text level and lexical complexity is observed, as well as a relation between some texts and their syntactic complexity.…”
Section: Experimental Protocol and First Resultsmentioning
confidence: 99%
“…However, it is not a good fit in our case when different types of texts with different sizes are compared because the values are inversely proportional to the text size. Following [12] [13], and [14], we also use Shannon entropy as a measure of lexical diversity in the texts:…”
Section: Feature Setmentioning
confidence: 99%
“…Evaluation with automatic metrics We use the corpus-based evaluator by Nekvinda and Dušek (2021) to measure commonly used metrics on Mul-tiWOZ (Inform & Success rates, BLEU) as well as lexical diversity measures, namely the number of distinct trigrams in the outputs and bigram conditional entropy (Li et al, 2016;Novikova et al, 2019). State tracking joint accuracy is calculated with scripts adapted from TRADE .…”
Section: Response Generationmentioning
confidence: 99%