Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries 2004
DOI: 10.1145/996350.996405
|View full text |Cite
|
Sign up to set email alerts
|

Metaextract

Abstract: We have developed MetaExtract, a system to automatically assign Dublin Core + GEM metadata using extraction techniques from our natural language processing research. MetaExtract is comprised of three distinct processes: eQuery and HTML-based Extraction modules and a Keyword Generator module. We conducted a Web-based survey to have users evaluate each metadata element's quality. Only two of the elements, Title and Keyword, were shown to be significantly different, with the manual quality slightly higher. The re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2005
2005
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 30 publications
(1 citation statement)
references
References 2 publications
0
1
0
Order By: Relevance
“…Natural language processing is part of current metadata extraction tools (Méndez et al, 2021) and has proven successful in a wide variety of domains such as e‐commerce (Paik et al, 2001), educational resources (Yilmazel et al, 2004), bio‐medical documents (Caufield et al, 2018; Valdez et al, 2016), legislative texts (Sleimi et al, 2018; Spinosa et al, 2009), software repositories (Tsay et al, 2020), and even archeological digital archives (Felicetti et al, 2018). These works show how to effectively integrate information extraction modules into processing pipelines to catalog large document libraries.…”
Section: Introductionmentioning
confidence: 99%
“…Natural language processing is part of current metadata extraction tools (Méndez et al, 2021) and has proven successful in a wide variety of domains such as e‐commerce (Paik et al, 2001), educational resources (Yilmazel et al, 2004), bio‐medical documents (Caufield et al, 2018; Valdez et al, 2016), legislative texts (Sleimi et al, 2018; Spinosa et al, 2009), software repositories (Tsay et al, 2020), and even archeological digital archives (Felicetti et al, 2018). These works show how to effectively integrate information extraction modules into processing pipelines to catalog large document libraries.…”
Section: Introductionmentioning
confidence: 99%