2015
DOI: 10.1609/aimag.v36i3.2601
|View full text |Cite
|
Sign up to set email alerts
|

CiteSeerX: AI in a Digital Library Search Engine

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 79 publications
(49 citation statements)
references
References 36 publications
0
49
0
Order By: Relevance
“…GROBID 16 (GeneRation Of BIbliographical Data, version 0.4.1) is a complex metadata extraction tool for header metadata and bibliographical extractions. GROBID uses pdftoxml 17 for content and layout extraction and conditional random fields for learning [11,12]. As the results in Table 3 show, our method performs well on title extraction, getting almost the same accuracies as GRO-BID, which obtained the best overall results in the experiments of [10].…”
mentioning
confidence: 86%
See 1 more Smart Citation
“…GROBID 16 (GeneRation Of BIbliographical Data, version 0.4.1) is a complex metadata extraction tool for header metadata and bibliographical extractions. GROBID uses pdftoxml 17 for content and layout extraction and conditional random fields for learning [11,12]. As the results in Table 3 show, our method performs well on title extraction, getting almost the same accuracies as GRO-BID, which obtained the best overall results in the experiments of [10].…”
mentioning
confidence: 86%
“…Another possibility is to use a digital library, from where the documents and metadata can be obtained in a more straightforward manner. One such digital library is CiteSeerX [5,17], which offers an OAI collection for metadata harvesting. also offer a huge amount of bibliographic data.…”
Section: A Hybrid Approach For Metadata Extractionmentioning
confidence: 99%
“…He has built several open source tools for metadata extraction using machine learning methods from PDFs and text for many unique entities such as figures, tables, equations, etc. and incorporated them into scholarly search engines such as CiteSeerX (Wu et al, 2015a) using an open source ingestion system (Wu et al, 2015b). Recent work has been on linking data and metadata in different databases such as PubMed and the Web of Science.…”
Section: Status Of Mvp On Daymentioning
confidence: 99%
“…CiteSeerX has proven to be a rich source of scholarly information beyond publications as exemplified through various derived data-sets, ranging from citation graphs to publication acknowledgments [16], meant to aid academic content management and analysis research [1]. Furthermore, CiteSeerX's open-source nature allows easy access to its implementations of tools that span focused web crawling to record linkage [35] to meta-data extraction to leveraging user-provided meta-data corrections [31]. A key aspect of CiteSeerX 's future lies in not only serving as an engine for continuously building an ever-improving collection of scholarly knowledge at web-scale, but also as a set of publicly-available tools to aid those interested in building digital library and search engine systems of their own.…”
Section: Overview: Architecturementioning
confidence: 99%