Proceedings of the Fifth ACM International Conference on Web Search and Data Mining 2012
DOI: 10.1145/2124295.2124328
|View full text |Cite
|
Sign up to set email alerts
|

Selecting actions for resource-bounded information extraction using reinforcement learning

Abstract: Given a database with missing or uncertain content, our goal is to correct and fill the database by extracting specific information from a large corpus such as the Web, and to do so under resource limitations. We formulate the information gathering task as a series of choices among alternative, resource-consuming actions and use reinforcement learning to select the best action at each time step. We use temporal difference q-learning method to train the function that selects these actions, and compare it to an … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 10 publications
0
22
0
Order By: Relevance
“…Such a setup would be more adaptive with respect to the number of queries asked and could thus be potentially more effective at avoiding to ask too many queries (cf. [9]). …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Such a setup would be more adaptive with respect to the number of queries asked and could thus be potentially more effective at avoiding to ask too many queries (cf. [9]). …”
Section: Discussionmentioning
confidence: 99%
“…While this is a relatively new approach, there are some related works. The most similar is perhaps Kanani and McCallum's [9] work on using reinforcement learning to learn an optimal policy for efficiently filling in missing values in a KB (they focus on filling in the email address, job title, and department affiliation of 100 professors at UMass Amherst). The actions available are to perform one of 20 possible types of query (e.g., name, name + "CV", name + "Amherst"), to download one of the n resulting Web pages, or to extract one of the three relations from the page.…”
Section: Related Workmentioning
confidence: 99%
“…Similar systems optimize the use of information extraction programs to add missing data values to an existing database [Kanani and McCallum 2012]. These techniques generally improve execution time or storage capacity by processing only those "promising" documents in the collection that contain information about the database relations, instead of the whole collection.…”
Section: Related Workmentioning
confidence: 99%
“…Researchers have noticed the overheads and costs of curating and organizing large datasets [Dong et al 2013;Kanani and McCallum 2012;Jain et al 2008a]. For example, some researchers have recently considered the problem of selecting datasets for fusion such that the marginal cost of acquiring and processing a new dataset does not exceed its marginal gain, where cost and gain are measured using the same metric, such as U.S. dollars [Dong et al 2013].…”
Section: Costs Of Concept Extractionsmentioning
confidence: 99%
“…Researchers have proposed several techniques to reduce the execution time of SQL queries over existing databases whose information comes from concept and relation extraction programs [13,15]. Similar systems optimize the use of information extraction programs to add missing data values to an existing database [16]. These techniques generally improve execution time or storage capacity by processing only the "promising" documents in the collection that contain the information about the database relations, instead of the whole collection.…”
Section: Related Workmentioning
confidence: 99%