Ugo Scaiella scite author profile

We address the problem of cross-referencing text fragments with Wikipedia pages, in a way that synonymy and polysemy issues are resolved accurately and efficiently. We take inspiration from a recent flow of work [3,10,12,14], and extend their scenario from the annotation of long documents to the annotation of short texts, such as snippets of searchengine results, tweets, news, blogs, etc.. These short and poorly composed texts pose new challenges in terms of efficiency and effectiveness of the annotation process, that we address by designing and engineering Tagme, the first system that performs an accurate and on-the-fly annotation of these short textual fragments. A large set of experiments shows that Tagme outperforms state-of-the-art algorithms when they are adapted to work on short texts and it results fast and competitive on long texts.

show abstract

Topical clustering of search results

Scaiella

Ferragina

Marino

et al. 2012

View full text Add to dashboard Cite

Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them.In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9,11,16,20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph.We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and wellknown commercial systems (Clusty and Lingo3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.

show abstract

Classification of Short Texts by Deploying Topical Annotations

Vitale

Ferragina

Scaiella

2012

View full text Add to dashboard Cite

Tagme

Ferragina

Scaiella

2010

519

View full text Add to dashboard Cite

Fast and accurate annotation of short texts with Wikipedia pages

Ferragina¹,

Scaiella²

2010

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ugo Scaiella

Fast and Accurate Annotation of Short Texts with Wikipedia Pages

Topical clustering of search results

Classification of Short Texts by Deploying Topical Annotations

Tagme

Fast and accurate annotation of short texts with Wikipedia pages

Contact Info

Product

Resources

About