Construction of the Literature Graph in Semantic Scholar

Ammar, Waleed; Groeneveld, Dirk; Bhagavatula, Chandra; Beltagy, Iz; Crawford, Miles; Downey, Doug; Dunkelberger, Jason; Elgohary, Ahmed; Feldman, Sergey; Ha, Vu; Kinney, Rodney; Kohlmeier, Sebastian; Lo, Kyle; Murray, Tyler; Ooi, Hsu-Han; Peters, Matthew E.; Power, Joanna; Skjonsberg, Sam; Wang, Lucy Lu; Wilhelm, Chris; Yuan, Zheng; Zuylen, Madeleine van; Etzioni, Oren

doi:10.18653/v1/n18-3011

Cited by 285 publications

(228 citation statements)

References 24 publications

Supporting

Mentioning

227

Contrasting

Order By: Relevance

“…However, first attempts towards a more semantic representation of article content exist: Ammar et al [1] interlink the Semantic Scholar Corpus with DBpedia [25] and Unified Medical Language System (UMLS) [6] using entity linking techniques. Yaman et al [43] connect SciGraph with DBpedia person entities.…”

Section: Applications For Domain-independent Scientific Information Ementioning

confidence: 99%

“…For academic search engines, Xiong et al [42] have shown that exploiting knowledge bases like Freebase can improve search results. However, the introduction of new scientific concepts occurs at a faster pace than knowledge base curation, resulting in a large gap in knowledge base coverage of scientific entities [1], e.g. the task geolocation estimation of photos from the Computer Vision field is neither present in Wikipedia nor in more specialised knowledge bases like Computer Science Ontology (CSO) [35] arXiv:2001.03067v1 [cs.IR] 9 Jan 2020 or "Papers with code" [32].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Domain-Independent Extraction of Scientific Concepts from Research Articles

Brack

D’Souza

Hoppe

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

show abstract

Section: Applications For Domain-independent Scientific Information Ementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Domain-Independent Extraction of Scientific Concepts from Research Articles

Brack

D’Souza

Hoppe

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Selection: Figure 2 displays our systematic approach to select articles relevant to this survey, based on [31]. First, we collect data from DBLP [35] and Semantic Scholar [2]. We filter them by venue, retaining only articles from the 10 key conferences and journals in distributed systems listed in the caption of Table 1, including SC.…”

Section: Article Selection and Labelingmentioning

confidence: 99%

The Workflow Trace Archive: Open-Access Data From Public and Private Computing Infrastructures

Versluis

Mathá

Talluri

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows-common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes >48 million workflows captured from >10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.

show abstract

“…We further consider a task of generating abstracts for scientific papers (Ammar et al, 2018), where the input contains a paper title and scientific entities mentioned in the abstract. We use the AGENDA data processed by Koncel-Kedziorski et al (2019), where entities and their relations in the abstracts are extracted by SciIE (Luan et al, 2018).…”

Section: Task Iii: Paper Abstract Generationmentioning

confidence: 99%

Sentence-Level Content Planning and Style Specification for Neural Text Generation

Hua¹,

Wang²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Building effective text generation systems requires three critical components: content selection, text planning, and surface realization, and traditionally they are tackled as separate problems. Recent all-in-one style neural generation models have made impressive progress, yet they often produce outputs that are incoherent and unfaithful to the input. To address these issues, we present an end-toend trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text. For experiments, we consider three tasks from domains with diverse topics and varying language styles: persuasive argument construction from Reddit, paragraph generation for normal and simple versions of Wikipedia, and abstract generation for scientific articles. Automatic evaluation shows that our system can significantly outperform competitive comparisons. Human judges further rate our system generated text as more fluent and correct, compared to the generations by its variants that do not consider language style.

show abstract

Construction of the Literature Graph in Semantic Scholar

Cited by 285 publications

References 24 publications

Domain-Independent Extraction of Scientific Concepts from Research Articles

Domain-Independent Extraction of Scientific Concepts from Research Articles

The Workflow Trace Archive: Open-Access Data From Public and Private Computing Infrastructures

Sentence-Level Content Planning and Style Specification for Neural Text Generation

Contact Info

Product

Resources

About