Identifying salient entities in web pages

Gamon, Michael; Yano, Tae; Song, Xinying; Apacible, Johnson; Pantel, Patrick

doi:10.1145/2505515.2505602

Cited by 27 publications

(57 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The two other sources of semantic features are used as a point of comparison to the DSSM. One is a generative semantic model (Joint Transition Topic model, or JTT) (Gamon et al 2013). JTT is an LDA-style model (Blei et al 2003) that is trained jointly on source and target documents linked by browsing transitions.…”

Section: Resultsmentioning

confidence: 99%

“…In addition to the notion of relevance as described in Section 1, related to interestingness is also the notion of salience (also called aboutness) (Gamon et al 2013;2014;Parajpe 2009;Yih et al 2006). Salience is the centrality of a term to the content of a document.…”

Section: Related Workmentioning

confidence: 99%

“…In contrast to these approaches, we strive to predict what term a user is likely to be interested in when reading content, which may or may not be the same as the most popular content that is related to the current document. It has empirically been demonstrated in Gamon et al (2013) that popularity is in fact a rather poor predictor for interestingness. The task of contextual entity search, which is formulated as an information retrieval problem in this paper, is also related to research on entity resolution (Stefanidis et al 2013).…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Modeling Interestingness with Deep Neural Networks

Gao

Pantel

Gamon

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

131

View full text Add to dashboard Cite

This paper presents a deep semantic similarity model (DSSM), a special type of deep neural networks designed for text analysis, for recommending target documents to be of interest to a user based on a source document that she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a way that the distance between source documents and their corresponding interesting targets in that space is minimized. The effectiveness of the DSSM is demonstrated using two interestingness tasks: automatic highlighting and contextual entity search. The results on large-scale, real-world datasets show that the semantics of documents are important for modeling interestingness and that the DSSM leads to significant quality improvement on both tasks, outperforming not only the classic document models that do not use semantics but also state-of-the-art topic models.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Interestingness with Deep Neural Networks

Gao

Pantel

Gamon

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

131

View full text Add to dashboard Cite

show abstract

“…The authors suggested the use of centering constructs to keep track of the key entities, which change with discourse. Document-level importance of entities (which include events) was explored by Gamon et al (2013). The authors use the term salience to denote entity importance and graded entities into 3 categories -most salient, less salient, not salient.…”

Section: Related Workmentioning

confidence: 99%

"Making the News": Identifying Noteworthy Events in News Articles

Upadhyay

Christodoulopoulos

Roth

2016

Proceedings of the Fourth Workshop on Events

View full text Add to dashboard Cite

Most events described in a news article are background events -only a small number are noteworthy, and a even smaller number serve as the trigger for writing of that article. Although these events are difficult to identify, they are crucial to NLP tasks such as first story detection, document summarization and event coreference, and to many applications of event analysis that depend on event counting and identifying trends. In this work, we introduce the notion of news-peg, a concept borrowed from the political science literature, in an attempt to remedy this problem. A news-peg is an event which prompted the author to write the article, and it serves as a more fine-grained measure of noteworthiness than a summary. We describe a new task of news-peg identification and release an annotated dataset for its evaluation. We formalize an operational definition of a news-peg, on which human annotators achieve high inter-annotator agreement (over 80%), and present a rule-based system for this task, which exploits syntactic features derived from established journalistic devices.

show abstract

“…Information need expression: If an information need is detected, the query is constructed from the page's relevant entities. The task of extracting relevant entities is somewhat similar to extracting salient entities [1], but opposed to salience or aboutness, relevance incorporates the user's interests. The relevant terms are determined by a named entity recognizer, using an adapted ranking mechanism based on the relatedness of the candidate entities to interest topics in the user profile.…”

Section: Page Levelmentioning

confidence: 99%

From context to query

Schlötterer

2015

Proceedings of the 30th Annual ACM Symposium on Applied Computing

View full text Add to dashboard Cite

Identifying salient entities in web pages

Cited by 27 publications

References 35 publications

Modeling Interestingness with Deep Neural Networks

Modeling Interestingness with Deep Neural Networks

"Making the News": Identifying Noteworthy Events in News Articles

From context to query

Contact Info

Product

Resources

About