2001
DOI: 10.1006/csla.2001.0169
|View full text |Cite
|
Sign up to set email alerts
|

Normalization of non-standard words

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
173
1
8

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 255 publications
(183 citation statements)
references
References 16 publications
1
173
1
8
Order By: Relevance
“…Though our findings are not fully comparable to those in the two previous references, we can see that our blog corpus is the one that presents the lowest deviation rate -comparable to newspaper text if we take into account that Sproat et al (2001) were looking to a wider variety of non-standard forms. In contrast, our other corpora present very high rates of deviationswhich are in line with the findings of both Sproat et al (2001) and Han and Baldwin (2011) in their less formal types of texts.…”
Section: Characteristics Of Spanish Ugc and English Ugccontrasting
confidence: 98%
See 2 more Smart Citations
“…Though our findings are not fully comparable to those in the two previous references, we can see that our blog corpus is the one that presents the lowest deviation rate -comparable to newspaper text if we take into account that Sproat et al (2001) were looking to a wider variety of non-standard forms. In contrast, our other corpora present very high rates of deviationswhich are in line with the findings of both Sproat et al (2001) and Han and Baldwin (2011) in their less formal types of texts.…”
Section: Characteristics Of Spanish Ugc and English Ugccontrasting
confidence: 98%
“…In contrast, our other corpora present very high rates of deviationswhich are in line with the findings of both Sproat et al (2001) and Han and Baldwin (2011) in their less formal types of texts.…”
Section: Characteristics Of Spanish Ugc and English Ugcsupporting
confidence: 89%
See 1 more Smart Citation
“…Next, for both data sets all the words in the sentences are labeled with parts of speech (POS) and named entities (NE). Finally, to ensure the integrity of the Twitter data, English language filtering † and non-standard word (NSW) normalization [14] is also performed.…”
Section: Preprocessingmentioning
confidence: 99%
“…Text normalization (Sproat et al, 2001) is an important initial phase for many natural language and speech applications. The basic task of text normalization is to convert non-standard words (NSWs) -numbers, abbreviations, dates, etc.…”
Section: Introductionmentioning
confidence: 99%