2010
DOI: 10.1007/978-3-642-14770-8_38
|View full text |Cite
|
Sign up to set email alerts
|

Portable Extraction of Partially Structured Facts from the Web

Abstract: Abstract. A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two rather different languages (English… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…Whilst impressive in its own right, this approach is not directly relevant for our main goal: (1) we need to capture more diverse and detailed kinds of information, but we do not require a formal knowledge representation as output, and (2) we want an approach that can be adapted easily to multiple languages and domains, which is not a feature of most information extraction approaches. Instead, we use a technique described and evaluated by Salway et al (2010) to extract a list of semi-structured key statements about a given topic from the web, ranked by a keyness metric; the statements comprise the things typically written about the topic. To port the technique to different domains and languages, it is only necessary to specify a sentence template, e.g. "…”
Section: Automatic Extraction Of Key Statementsmentioning
confidence: 99%
“…Whilst impressive in its own right, this approach is not directly relevant for our main goal: (1) we need to capture more diverse and detailed kinds of information, but we do not require a formal knowledge representation as output, and (2) we want an approach that can be adapted easily to multiple languages and domains, which is not a feature of most information extraction approaches. Instead, we use a technique described and evaluated by Salway et al (2010) to extract a list of semi-structured key statements about a given topic from the web, ranked by a keyness metric; the statements comprise the things typically written about the topic. To port the technique to different domains and languages, it is only necessary to specify a sentence template, e.g. "…”
Section: Automatic Extraction Of Key Statementsmentioning
confidence: 99%
“…the fact that the same information about a landmark is available in many forms on the web. This method is described in detail in [16]. For a given landmark, we return a list of facts in the form (Landmark, Cue, Text-Fragment), ranked according to a score which is intended to promote interesting and true facts.…”
Section: Fact Extraction and Title Augmentationmentioning
confidence: 99%
“…This itself will provide more effective sources of information for landmark identification and for the selection of accurate and interesting tags. While the system described here is only implemented for English, the methods used are all language independent and it could easily be ported to other languages by means of new stopword lists, localised toponyms lists and collections of word patterns in the fact extraction stage [16].…”
Section: Conclusion and Further Workmentioning
confidence: 99%