2022
DOI: 10.1371/journal.pone.0275872
|View full text |Cite
|
Sign up to set email alerts
|

Data driven identification of international cutting edge science and technologies using SpaCy

Abstract: Difficulties in collecting, processing, and identifying massive data have slowed research on cutting-edge science and technology hotspots. Promoting these technologies will not be successful without an effective data-driven method to identify cutting-edge technologies. This paper proposes a data-driven model for identifying global cutting-edge science technologies based on SpaCy. In this model, we collected data released by 17 well-known American technology media websites from July 2019 to July 2020 using web … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 63 publications
0
3
0
Order By: Relevance
“…SMS data cannot be adequately analysed if latent sentence relationships are not fully captured (Gormley et al, 2015). Components of NNs identified as important for short-textual modelling, for example sentencizers and dependency parsers, are implemented within spaCy (Hu et al, 2022). A RoBERTa-derived model is adapted for spaCy transformer implementation and is an extension of SBERT concepts.…”
Section: Figurementioning
confidence: 99%
See 1 more Smart Citation
“…SMS data cannot be adequately analysed if latent sentence relationships are not fully captured (Gormley et al, 2015). Components of NNs identified as important for short-textual modelling, for example sentencizers and dependency parsers, are implemented within spaCy (Hu et al, 2022). A RoBERTa-derived model is adapted for spaCy transformer implementation and is an extension of SBERT concepts.…”
Section: Figurementioning
confidence: 99%
“…Inherent edge approximations during topological inference generation must be implemented to avoid exceeding O(n 3 T) runtime. Implementations based on comparing specific topological sentence components (Hu et al, 2022) effectively limit exponential processing overheads. Honnibal and Johnson (2014) argue that lightweight transformer implementation is achieved by choosing appropriate within-model parameters.…”
Section: Figurementioning
confidence: 99%
“…In the third step, spaCy, a word segmentation tool, was used for segmentation processing on the text. It is also currently the fastest and best method for deep learning from text and can be written in the programming language Python (Honnibal and Johnson, 2015;Hu et al, 2022). The executed commands included removing stop words and stemming to reduce the total number of unique words in the dictionary.…”
Section: Step : Data Cleaning and Abstract Segmentationmentioning
confidence: 99%