Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning From Non-Linguistic Data - 2003
DOI: 10.3115/1119212.1119214
|View full text |Cite
|
Sign up to set email alerts
|

Words and pictures in the news

Abstract: We discuss the properties of a collection of news photos and captions, collected from the Associated Press and Reuters. Captions have a vocabulary dominated by proper names. We have implemented various text clustering algorithms to organize these items by topic, as well as an iconic matcher that identifies articles that share a picture. We have found that the special structure of captions allows us to extract some names of people actually portrayed in the image quite reliably, using a simple syntactic analysis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2007
2007
2010
2010

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 3 publications
0
11
0
Order By: Relevance
“…Names: We extract a lexicon of proper names from all the captions by identifying two or more capitalized words followed by a present tense verb ( [8]). Words are classified as verbs by first applying a list of morphological rules to present tense singular forms, and then comparing these to a database of known verbs (WordNet [25]).…”
Section: Datasetmentioning
confidence: 99%
“…Names: We extract a lexicon of proper names from all the captions by identifying two or more capitalized words followed by a present tense verb ( [8]). Words are classified as verbs by first applying a list of morphological rules to present tense singular forms, and then comparing these to a database of known verbs (WordNet [25]).…”
Section: Datasetmentioning
confidence: 99%
“…In the internet-domain, [9] and [10] collected over a halfmillion news images and associated captions and used clustering techniques to create a 28,000 image database of faces and name profiles, the only one of its kind to date.…”
Section: Related Workmentioning
confidence: 99%
“…Since then, a plethora of techniques from different fields have been studied. These include probabilistic frameworks [15], artificial neural networks [29], and others including those that seek to combine other modalities such as text into the search process [30]. In addition, algorithms from the text retrieval community like query-point-refinement [10] have been adapted to the problem in conjunction with pattern recognition techniques such as discriminant analysis [27], [31].…”
Section: A Relevance Feedbackmentioning
confidence: 99%