2021
DOI: 10.1016/j.simpa.2021.100058
|View full text |Cite
|
Sign up to set email alerts
|

Spark NLP: Natural Language Understanding at Scale

Abstract: Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(28 citation statements)
references
References 14 publications
0
28
0
Order By: Relevance
“…Finally, we formatted the training and testing data to conform to the conference on natural language learning (CoNLL) format. We then used a pre-trained deep learning model, provided by the Python "sparknlp" package [41], to produce ELMo embeddings for each sentence's tokens. These word embeddings were used as features in the deep learning NER model that was generated using the "sparknlp" package.…”
Section: Methodsmentioning
confidence: 99%
“…Finally, we formatted the training and testing data to conform to the conference on natural language learning (CoNLL) format. We then used a pre-trained deep learning model, provided by the Python "sparknlp" package [41], to produce ELMo embeddings for each sentence's tokens. These word embeddings were used as features in the deep learning NER model that was generated using the "sparknlp" package.…”
Section: Methodsmentioning
confidence: 99%
“…As deep learning models have successfully in NLP, there is a need to implement pre-trained models and scale large data with distributed use cases. John Snow Labs 2 developed Spark NLP as a library built on top of Apache Spark and Apache MLib that provides an NLP pipeline and pre-trained models [17]. The library offers the ability to train, customize and save models so they can be run on clusters, other machines, or stored.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, we use the existing pre-trained models [8], [16].To demonstrate the efficiency of this method, we conducted extensive experiments to study our proposed approach. We use Spark NLP built on top of Apache Spark as a library that can scale the entire classification process in a distributed environment [17]. We compared the performance of the base method model with the classifier pipelines from Spark NLP.…”
Section: Introductionmentioning
confidence: 99%
“…of the ordered laboratory tests, and 4) patient demographics (age/race/sex/ethnicity). We parsed each note into sections and used the SparkNLP library 35 named entity recognizer (NER) for extracting medical conditions from the clinical notes (see Supplementary section on "Data Sources" for implementation details). The extractions were used to determine the presence or absence of baseline risk factors for each patient at the time of admission, including: Coronary Artery Disease (CAD), diabetes, family history, hyperlipidemia, hypertension, existing medication, obesity, and smoking.…”
Section: Data Sourcesmentioning
confidence: 99%