2021
DOI: 10.48550/arxiv.2112.00590
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Building astroBERT, a language model for Astronomy & Astrophysics

Abstract: The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural lan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…One approach to generating scientific KGs is extracting entities from scientific documents [14,16,33]. There are multiple named-entity recognition (NER) algorithms to extract scientific entities, such as As-troBERT [13], and SciBERT [4]. These scientific entities can serve as the nodes of the KGs.…”
Section: Scientific Information Processingmentioning
confidence: 99%
See 2 more Smart Citations
“…One approach to generating scientific KGs is extracting entities from scientific documents [14,16,33]. There are multiple named-entity recognition (NER) algorithms to extract scientific entities, such as As-troBERT [13], and SciBERT [4]. These scientific entities can serve as the nodes of the KGs.…”
Section: Scientific Information Processingmentioning
confidence: 99%
“…is not specifically designed to recognize terms related to the NASA SMD. In the future, we aim to create NER algorithms similar to AstroBERT [13], and SciBERT [4]. In line with these two NER algorithms, we want to fine-tune models from the BERT family [9].…”
Section: Generating Nodesmentioning
confidence: 99%
See 1 more Smart Citation
“…ADS already has a long tradition of openness, and with the expansion into SciX we will redouble our efforts to share our efforts with the larger research community. In 2021 we created and released a custom language model built on the astronomical literature called astroBERT (Grezes et al 2021). We have contributed data sets for the 2022 and 2023 data challenges at the first and second Workshops on Information Extraction from Scholarly Papers 3 .…”
Section: Future Ads: 2021-2030mentioning
confidence: 99%
“…Employing Google's Bidirectional Encoder Representations from Transformers (BERT; Devlin et al 2018) deep neural network architecture, Grezes et al (2021) have developed a domain-specific model for astronomy, termed astroBERT, through the training on a corpus comprising 395,499 astronomical research papers. Subsequently, this model was used for the development of the NER tool in ADS, which includes identifying specific organizations, projects, terms, etc., in the literature.…”
Section: Introductionmentioning
confidence: 99%