2013
DOI: 10.17562/pb-48-8
|View full text |Cite
|
Sign up to set email alerts
|

A POS Tagger for Social Media Texts Trained on Web Comments

Abstract: Abstract-Using social media tools such as blogs and forums have become more and more popular in recent years. Hence, a huge collection of social media texts from different communities is available for accessing user opinions, e.g., for marketing studies or acceptance research. Typically, methods from Natural Language Processing are applied to social media texts to automatically recognize user opinions. A fundamental component of the linguistic pipeline in Natural Language Processing is Part-of-Speech tagging. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 15 publications
0
11
0
Order By: Relevance
“…These new challenges include the management of metainformation included in the text (for example as tags in tweets) (Bouillot et al, 2013), the detection of typos and unconventional spelling, word short- enings (Neunerdt et al, 2013;Moreira et al, 2013) and slang and emoticons (Balahur, 2013), among others. Another challenge that should be taken into account is that while clinical records and medical literature can be mapped to terminological resources or biomedical ontologies, lay terminology used by patients to describe their treatments and their effects, in general, is not collected in any terminological resource, which would facilitate the automatic processing of this kind of texts.…”
Section: Resultsmentioning
confidence: 99%
“…These new challenges include the management of metainformation included in the text (for example as tags in tweets) (Bouillot et al, 2013), the detection of typos and unconventional spelling, word short- enings (Neunerdt et al, 2013;Moreira et al, 2013) and slang and emoticons (Balahur, 2013), among others. Another challenge that should be taken into account is that while clinical records and medical literature can be mapped to terminological resources or biomedical ontologies, lay terminology used by patients to describe their treatments and their effects, in general, is not collected in any terminological resource, which would facilitate the automatic processing of this kind of texts.…”
Section: Resultsmentioning
confidence: 99%
“…As such, they can be compared to other types of non-standard language, including computer-mediated communication or learner texts. Previous studies in this area have noted the challenges of applying the existing NLP tools and tagsets, which are often trained on the basis of newspaper language, to the data that deviates from this standard (Neunerdt et al, 2013;Zinsmeister et al, 2014). This issue is addressed by development of modified taggers as well as adaptations of the tagsets, for instance to include tags that are unique to a certain type of data (cf.…”
Section: Part Of Speech Annotationmentioning
confidence: 99%
“…e.g. Neunerdt et al, 2013 for annotation of social media texts or Westpfahl and Schmidt, 2013 for enrichment of spoken German).…”
Section: Part Of Speech Annotationmentioning
confidence: 99%
“…We used several instruments in this experiment, Moses [1] as machine translators, SRILM [26] to building language and PoS models, Giza++ [27] for word alignment process, and Grammar Postagger for PoS tagging. Furthermore, we use the BLEU method [28] for scoring the translation results.…”
Section: Experiments On Smtmentioning
confidence: 99%
“…Mart [23] used 47 tags to build a Spanish treebank in Spanish. For developed part-of-speech tagger, Avontuur et al [24] used 25 tags for Dutch, Singha et al [25] used 97 tags for Manipuri, Neunerdt et al [26] used 54 tags for German.…”
Section: Introductionmentioning
confidence: 99%