Proceedings of the Twelfth International Conference on World Wide Web - WWW '03 2003
DOI: 10.1145/775152.775178
|View full text |Cite
|
Sign up to set email alerts
|

SemTag and seeker

Abstract: This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We descri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2005
2005
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 259 publications
(15 citation statements)
references
References 17 publications
0
15
0
Order By: Relevance
“…RELATED WORK A significant amount of prior work has considered the automated annotation of Web content using shallow natural language processing (e.g. morphological analysis, named entity recognition) and disambiguation techniques [EMS+00,DEG+03]. Semi-automatic and automatic approaches to full-text semantic annotation have proved to be relatively scalable.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…RELATED WORK A significant amount of prior work has considered the automated annotation of Web content using shallow natural language processing (e.g. morphological analysis, named entity recognition) and disambiguation techniques [EMS+00,DEG+03]. Semi-automatic and automatic approaches to full-text semantic annotation have proved to be relatively scalable.…”
Section: Resultsmentioning
confidence: 99%
“…Semi-automatic and automatic approaches to full-text semantic annotation have proved to be relatively scalable. Dill et al [DEG+03] present a case study in which they apply such techniques to annotate a corpus of 264 million web pages. However, the precision of these techniques is limited by the fact that they deal with unstructured content.…”
Section: Resultsmentioning
confidence: 99%
“…Semi-automatic approaches to facilitate the annotation process exist for many of the annotation tasks, for example, human activity recognition [9], video object detection and segmentation [41], semantic tagging of large corpora [15], and animal behaviour [23]. Most available methods are interactive and require a human-in-the-loop approach.…”
Section: Related Workmentioning
confidence: 99%
“…To this camp the only real option is automation. The other camp points out that automation is even more error-prone than manual creation, as current efforts to automatic semantic annotation on massive scales produces only moderate results of between 80% and 90% correct, at the very best [1]. They claim that the remaining 10% will always be beyond reach because it requires significant amounts of human-level intelligence to be done correctly.…”
Section: Related Workmentioning
confidence: 99%