2014
DOI: 10.1007/978-3-319-06028-6_26
|View full text |Cite
|
Sign up to set email alerts
|

CiteSeer x : A Scholarly Big Dataset

Abstract: Abstract.The CiteSeer x digital library stores and indexes research articles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeer x has been proven as a powerful resource in many data mining, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeer x is done using automated techniques. Althoug… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 44 publications
(30 citation statements)
references
References 21 publications
0
30
0
Order By: Relevance
“…We used the implementation of topic models from Mallet. 3 To train the topic 3 http://mallet.cs.umass.edu/ model, we used a subset of about 45, 000 paper abstracts extracted from the CiteSeer x scholarly big dataset introduced by Caragea et al (2014b). For all models, the score of a phrase is obtained by summing the score of the constituent words in the phrase.…”
Section: Resultsmentioning
confidence: 99%
“…We used the implementation of topic models from Mallet. 3 To train the topic 3 http://mallet.cs.umass.edu/ model, we used a subset of about 45, 000 paper abstracts extracted from the CiteSeer x scholarly big dataset introduced by Caragea et al (2014b). For all models, the score of a phrase is obtained by summing the score of the constituent words in the phrase.…”
Section: Resultsmentioning
confidence: 99%
“…Fang et al's method [10] focuses on the header detection of different tables available in PDF documents collected from the dataset of CiteSeer [11]. Some techniques focused on table exploring by using table layout characteristics; however, table structure mattered a lot.…”
Section: Table and Header Detectionmentioning
confidence: 99%
“…Another possibility is to use a digital library, from where the documents and metadata can be obtained in a more straightforward manner. One such digital library is CiteSeerX [5,17], which offers an OAI collection for metadata harvesting. also offer a huge amount of bibliographic data.…”
Section: A Hybrid Approach For Metadata Extractionmentioning
confidence: 99%