Proceedings of the Workshop on Noisy User-Generated Text 2015
DOI: 10.18653/v1/w15-4322
|View full text |Cite
|
Sign up to set email alerts
|

Multimedia Lab $@$ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations

Abstract: Due to the short and noisy nature of Twitter microposts, detecting named entities is often a cumbersome task. As part of the ACL2015 Named Entity Recognition (NER) shared task, we present a semisupervised system that detects 10 types of named entities. To that end, we leverage 400 million Twitter microposts to generate powerful word embeddings as input features and use a neural network to execute the classification. To further boost the performance, we employ dropout to train the network and leaky Rectified Li… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
135
2
5

Year Published

2017
2017
2019
2019

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 161 publications
(143 citation statements)
references
References 5 publications
1
135
2
5
Order By: Relevance
“…These embeddings were obtained from Godin et al (2015). Words which are out-of-vocabulary are replaced with UNK.…”
Section: Preprocessing Of Text and Textual Representationmentioning
confidence: 99%
“…These embeddings were obtained from Godin et al (2015). Words which are out-of-vocabulary are replaced with UNK.…”
Section: Preprocessing Of Text and Textual Representationmentioning
confidence: 99%
“…(b) Extrinsic methods evaluate word vectors by measuring their performance when used for downstream NLP tasks e.g., dependency parsing, named entity recognition (Passos et al, 2014;Godin et al, 2015).…”
Section: Embedding Evaluationmentioning
confidence: 99%
“…We perform a grid-search using cross-validation on our training set for parameter tuning, and report results on our test set. For each of the models, we establish a baseline with W2V features (Google News-trained Word2Vec size 300 for the debate forums, and Twitter-trained Word2Vec size 400 (Godin et al, 2015), for the tweets). We experiment with different embedding representations, finding that we achieve best results by averaging the word embeddings for each input when using SVM, and creating an embedding matrix (number of words by embedding size for each in- Figure 1: LSTM Network Architecture put) as input to an embedding layer when using LSTM.…”
Section: Rqs Vs Information-seeking Qsmentioning
confidence: 99%