Entity extraction, linking, classification, and tagging for social media

Gattani, Abhishek; Lamba, Digvijay S.; Garera, Nikesh; Tiwari, Mitul; Chai, Xiaoyong; Das, Sanjib; Subramaniam, Sri; Rajaraman, Anand; Harinarayan, Venky; Doan, AnHai

doi:10.14778/2536222.2536237

Cited by 101 publications

(71 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We are motivated to use convolutional networks through the work of Wu and Ma (2017), but we distinguish our approach by using deep convolution to build embeddings for character identification. Entity linking has traditionally relied heavily on knowledge databases, most notably, Wikipedia, for entities (Mihalcea and Csomai, 2007b;Ratinov et al, 2011b;Gattani et al, 2013;Francis-Landau et al, 2016). 3 Although we do not make use of knowledge bases, our task is closely aligned to entity linking.…”

Section: Related Workmentioning

confidence: 99%

Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts

Chen¹,

Zhou²,

Choi³

2017

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

View full text Add to dashboard Cite

This paper presents a novel approach to character identification, that is an entity linking task that maps mentions to characters in dialogues from TV show transcripts. We first augment and correct several cases of annotation errors in an existing corpus so the corpus is clearer and cleaner for statistical learning. We also introduce the agglomerative convolutional neural network that takes groups of features and learns mention and mention-pair embeddings for coreference resolution. We then propose another neural model that employs the embeddings learned and creates cluster embeddings for entity linking. Our coreference resolution model shows comparable results to other state-of-the-art systems. Our entity linking model significantly outperforms the previous work, showing the F1 score of 86.76% and the accuracy of 95.30% for character identification.

show abstract

Section: Related Workmentioning

confidence: 99%

Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts

Chen¹,

Zhou²,

Choi³

2017

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

View full text Add to dashboard Cite

show abstract

“…missing word recovery and punctuation correction. Gattani et al (2013) designed an application that extracts entities from social data, such as tweets. Their system uses a Wikipedia-based global 'real-time' knowledge base that is well suited for social data, and generates and uses contexts and social signals to improve task accuracy.…”

Section: Literature Reviewmentioning

confidence: 99%

Competitive analysis of social media data in the banking industry

Afolabi

Ezenwoke

Ayo

2017

IJIMA

View full text Add to dashboard Cite

Abstract:Recently, most companies interact more with their customers through the social media, particularly Facebook and Twitter. This has made large amount of textual data freely available on the internet for competitive intelligence analysis, which is helping reposition more and more companies for better profit. In order to carry out competitive intelligence, financial institutions need to take note of and analyse their competitor's social media sites. This paper, therefore, aims to help the banking industry in Nigeria understand how to perform a social media competitive analysis and transform social media data into knowledge, which will form the foundation for decision-making and internet marketing of such institutions. The study describes an in-depth case study which applies text mining to analyse unstructured text content on Facebook and Twitter sites of the five largest and leading financial institutions (banks) in Nigeria: Zenith Bank, First Bank, United Bank for Africa, Access Bank and GTBank. Analysing the social media content of these institutions will increase their competitive advantage and also lead to more profit for the banking institutions in question. The results obtained from this research showed that text mining is able to reveal uncommon and non-trivial trend for competitive advantage from social media data, and also provide specific recommendations to help banks maximise their competitive edge.

show abstract

“…Generally, crucial procedures of entity linking include candidate entity generation, candidate ranking and unlinkable mention prediction [1]. Candidate entities are usually generated using a name dictionary that is built offline [21], while candidate ranking mainly exploits supervised learning methods, such as binary classification [6,22,23], learning to rank [24][25][26], structure learning, graph-based methods [11,[27][28][29] and probabilistic methods [3,30]. Among these methods, binary classification is a simple and natural choice, but it suffers from the data imbalance problem.…”

Section: Pipeline Architecture For Entity Detection and Linkingmentioning

confidence: 99%

A Two-Stage Joint Model for Domain-Specific Entity Detection and Linking Leveraging an Unlabeled Corpus

Zhang

Huang

et al. 2017

Information

View full text Add to dashboard Cite

Abstract:The intensive construction of domain-specific knowledge bases (DSKB) has posed an urgent demand for researches about domain-specific entity detection and linking (DSEDL). Joint models are usually adopted in DSEDL tasks, but data imbalance and high computational complexity exist in these models. Besides, traditional feature representation methods are insufficient for domain-specific tasks, due to problems such as lack of labeled data, link sparseness in DSKBs, and so on. In this paper, a two-stage joint (TSJ) model is proposed to solve the data imbalance problem by discriminatively processing entity mentions with different degrees of ambiguity. In addition, three novel methods are put forward to generate effective features by incorporating an unlabeled corpus. One crucial feature involving entity detection is the mention type, extracted by a long short-term memory (LSTM) model trained on automatically annotated data. The other two types of features mainly involve entity linking, including the inner-document topical coherence, which is measured based on entity co-occurring relationships in the corpus, and the cross-document entity coherence evaluated using similar documents. An overall 74.26% F1 value is obtained on a dataset of real-world movie comments, demonstrating the effectiveness of the proposed approach and indicating its potentiality to be used in real-world domain-specific applications.

show abstract

Entity extraction, linking, classification, and tagging for social media

Cited by 101 publications

References 30 publications

Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts

Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts

Competitive analysis of social media data in the banking industry

A Two-Stage Joint Model for Domain-Specific Entity Detection and Linking Leveraging an Unlabeled Corpus

Contact Info

Product

Resources

About