Data driven identification of international cutting edge science and technologies using SpaCy

Hu, Chunqi; Gong, Huaping; He, Yang

doi:10.1371/journal.pone.0275872

Cited by 5 publications

(3 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SMS data cannot be adequately analysed if latent sentence relationships are not fully captured (Gormley et al, 2015). Components of NNs identified as important for short-textual modelling, for example sentencizers and dependency parsers, are implemented within spaCy (Hu et al, 2022). A RoBERTa-derived model is adapted for spaCy transformer implementation and is an extension of SBERT concepts.…”

Section: Figurementioning

confidence: 99%

“…Inherent edge approximations during topological inference generation must be implemented to avoid exceeding O(n 3 T) runtime. Implementations based on comparing specific topological sentence components (Hu et al, 2022) effectively limit exponential processing overheads. Honnibal and Johnson (2014) argue that lightweight transformer implementation is achieved by choosing appropriate within-model parameters.…”

Section: Figurementioning

confidence: 99%

See 1 more Smart Citation

Establishing an Optimal Online Phishing Detection Method: Evaluating Topological NLP Transformers on Text Message Data

Milner

Baron

2023

JDSIS

View full text Add to dashboard Cite

This research establishes an optimal classification model for online SMS spam detection by utilizing topological sentence transformer methodologies. The study is a response to the increasing sophisticated and disruptive activities of malicious actors. We present a viable lightweight integration of pre-trained NLP repository models with sklearn functionality. The study design mirrors the spaCy pipeline component architecture in a downstream sklearn pipeline implementation and introduces a user-extensible spam SMS solution. We leverage large-text data models from HuggingFace (roberta-base) via spaCy and apply linguistic NLP transformer methods to short-sentence NLP datasets. We compare the F1-scores of models and iteratively retest models using a standard sklearn pipeline architecture. Applying spaCy transformer modelling achieves an optimal F1-score of 0.938, a result comparable to existing research output from contemporary BERT/SBERT/‘black box’ predictive models. This research introduces a lightweight, user-interpretable, standardized, predictive SMS-spam detection model, that utilizes semantically similar paraphrase/ sentence transformer methodologies and generates optimal F1-scores for an SMS dataset. Significant F1-scores are also generated for a Twitter evaluation set, indicating potential real-world suitability.

show abstract

Section: Figurementioning

confidence: 99%

Section: Figurementioning

confidence: 99%

Establishing an Optimal Online Phishing Detection Method: Evaluating Topological NLP Transformers on Text Message Data

Milner

Baron

2023

JDSIS

View full text Add to dashboard Cite

show abstract

“…In the third step, spaCy, a word segmentation tool, was used for segmentation processing on the text. It is also currently the fastest and best method for deep learning from text and can be written in the programming language Python (Honnibal and Johnson, 2015;Hu et al, 2022). The executed commands included removing stop words and stemming to reduce the total number of unique words in the dictionary.…”

Section: Step : Data Cleaning and Abstract Segmentationmentioning

confidence: 99%

Frontiers of policy and governance research in a smart city and artificial intelligence: an advanced review based on natural language processing

Dong

Liu

2023

Front. Sustain. Cities

View full text Add to dashboard Cite

This study presents an advanced review of policy and governance research in the context of smart cities and artificial intelligence (AI). With cities playing a crucial role in achieving the United Nations Sustainable Development Goals, it is vital to understand the opportunities and challenges that arise from the applications of smart technologies and AI in promoting urban sustainability. Using the Latent Dirichlet Allocation (LDA) method based on a three-layer Bayesian algorithm model, we conducted a systematic review of approximately 3700 papers from Scopus. Our analysis revealed prominent topics such as “service transformation,” “community participation,” and “sustainable development goals.” We also identified emerging concerns, including “open user data,” “ethics and risk management,” and “data privacy management.” These findings provide valuable insights into the current progress and frontiers of policy and governance research in the field, informing future research directions and decision-making processes.

show abstract

Complexed hyaluronic acid-based nanoparticles in cancer therapy and diagnosis: Research trends by natural language processing

Umar,

Limpikirati,

Rivai

et al. 2025

Heliyon

View full text Add to dashboard Cite

Data driven identification of international cutting edge science and technologies using SpaCy

Cited by 5 publications

References 63 publications

Establishing an Optimal Online Phishing Detection Method: Evaluating Topological NLP Transformers on Text Message Data

Establishing an Optimal Online Phishing Detection Method: Evaluating Topological NLP Transformers on Text Message Data

Frontiers of policy and governance research in a smart city and artificial intelligence: an advanced review based on natural language processing

Complexed hyaluronic acid-based nanoparticles in cancer therapy and diagnosis: Research trends by natural language processing

Contact Info

Product

Resources

About