Tweeki: Linking Named Entities on Twitter to a Knowledge Graph

Harandizadeh, Bahareh; Singh, Sameer

doi:10.18653/v1/2020.wnut-1.29

Cited by 10 publications

(23 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Or some NIL entities that do not exist in other KGs could exist in Wikidata. Eleven datasets [16,23,24,27,29,33,46,56,69,80] were found for which Wikidata identifiers were available from the start. In the following the datasets are separated by their domain.…”

Section: Overviewmentioning

confidence: 99%

“…TweekiData and TweekiGold [46] are an automatically annotated corpus and a manually annotated dataset for EL over tweets. TweekiData was created by using other existing tweet-based datasets and linking them to Wikidata data via the Tweeki EL.…”

Section: Twitter Datasetsmentioning

confidence: 99%

“…Kensho Derived Wikimedia Dataset [56] 1 [ 86] CLEF HIPE 2020 [29] 3 [ 15,65,89] Mewsli-9 [16] 1 [ 16] TweekiData [46] 1 [ 46] TweekiGold [46] 1 [ 46] Table 9 Ambiguity of mentions (existence of a match does not correspond to a correct match), NYT2018 dataset was not available and LC-QuAD Wikimedia Dataset and the Mewsli-9 training dataset have the largest percentage of exact matches for labels. While TweekiGold is quite ambiguous, deciding on the most prominent entity appears to produce good EL results.…”

Section: Answer -Rq 1 Which Wikidata El Datasets Exist How Widely Used Are They and How Are They Constructed?mentioning

confidence: 99%

“…Some of those approaches solve the ER and EL jointly as an end-to-end task. Besides that, there exist two rule-based approaches [46,100] and two based on graph optimization [60,69].…”

Section: Approachesmentioning

confidence: 99%

“…Tweeki [46] is an approach focusing on unsupervised EL over tweets. The ER is done by a pre-existing Entity Recognizer [40] which also tags the mentions.…”

Section: Rule-based Approachesmentioning

confidence: 99%

See 4 more Smart Citations

Survey on English Entity Linking on Wikidata: Datasets and approaches

Cedric

Lehmann

Usbeck

2022

View full text Add to dashboard Cite

Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

show abstract

Section: Overviewmentioning

confidence: 99%

Section: Twitter Datasetsmentioning

confidence: 99%

Section: Answer -Rq 1 Which Wikidata El Datasets Exist How Widely Used Are They and How Are They Constructed?mentioning

confidence: 99%

“…Some of those approaches solve the ER and EL jointly as an end-to-end task. Besides that, there exist two rule-based approaches [46,100] and two based on graph optimization [60,69].…”

Section: Approachesmentioning

confidence: 99%

“…Tweeki [46] is an approach focusing on unsupervised EL over tweets. The ER is done by a pre-existing Entity Recognizer [40] which also tags the mentions.…”

Section: Rule-based Approachesmentioning

confidence: 99%

See 3 more Smart Citations

Survey on English Entity Linking on Wikidata: Datasets and approaches

Cedric

Lehmann

Usbeck

2022

View full text Add to dashboard Cite

show abstract

Understanding the Impact of Entity Linking on the Topology of Entity Co-occurrence Networks for Social Media Analysis

Nevin,

Zhang,

Dimitrov

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Urdu Wikification and Its Application in Urdu News Recommendation System

et al. 2022

View full text Add to dashboard Cite

Wikification is the process of linking the entities found in a sample text to their individual Wikipedia or Wikidata pages. Many natural language processing applications, including question-answering systems, information retrieval, fraud detection, and recommendation systems(RS), can benefit from this information extraction technique. There has been a great deal of effort put towards entity-linking(EL) for both Asian and Western languages, with several datasets and numerous proposed methodologies. Despite millions of Urdu language users globally, the entitylinking for the Urdu language has not been worked on, to the best of our knowledge. This work proposes an Urdu EL pipeline to identify named entities in text and link them to Wikidata. Secondly, a dataset of 550 Urdu news titles relating to their respective Wiki-ids has been prepared for the examination. Third, utilizing the proposed EL pipeline, 16738 news articles from the firstever Urdu news RS dataset of 100 users are annotated. Fourthly, a sub Knowledge graph (KG) of 8439 entities and 23080 relationship tuples is retrieved from Wikidata. The Trans-E algorithm is then used to create KG embeddings so that the extracted KG may be used in an Urdu news RS. The final accuracy of Urdu news RS is 60.8%.

show abstract

Tweeki: Linking Named Entities on Twitter to a Knowledge Graph

Cited by 10 publications

References 28 publications

Survey on English Entity Linking on Wikidata: Datasets and approaches

Survey on English Entity Linking on Wikidata: Datasets and approaches

Understanding the Impact of Entity Linking on the Topology of Entity Co-occurrence Networks for Social Media Analysis

Urdu Wikification and Its Application in Urdu News Recommendation System

Contact Info

Product

Resources

About