2022
DOI: 10.2196/preprints.40843
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Development and Validation of Deep Learning Transformer Models for Building a Comprehensive and Real-time Trauma Monitoring (Preprint)

Abstract: BACKGROUND In order to study the feasibility of setting up a national trauma observatory in France, OBJECTIVE we compared the performance of several automatic language processing methods on a multi-class classification task of unstructured clinical notes. METHODS A total of 69,110 free-text clinical no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 35 publications
0
2
0
Order By: Relevance
“…For French, two models based on RoBERTa [25] are trained and optimized: CamemBERT [27], trained on the French part of OS-CAR [33] and FlauBERT [20], trained on 24 corpora of various styles collected from the internet. These state-of-the-art French pretrained language models have been fine-tuned on various data for text classification tasks, such as tweets classification [19] or clinical notes classification [4]. Finally, studies on automatic speech transcriptions [14] and digitized texts with optical character recognition (OCR) [18] analyzed the impact of noisy inputs on contextualized word embeddings.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For French, two models based on RoBERTa [25] are trained and optimized: CamemBERT [27], trained on the French part of OS-CAR [33] and FlauBERT [20], trained on 24 corpora of various styles collected from the internet. These state-of-the-art French pretrained language models have been fine-tuned on various data for text classification tasks, such as tweets classification [19] or clinical notes classification [4]. Finally, studies on automatic speech transcriptions [14] and digitized texts with optical character recognition (OCR) [18] analyzed the impact of noisy inputs on contextualized word embeddings.…”
Section: Related Workmentioning
confidence: 99%
“…Most documents are 1-10 sentences and 5-91 tokens long. The dataset is, therefore, composed of short documents compared to the reference datasets 4 . Moreover, depending on the type of exercise, the word and sentence lengths are highly variable.…”
Section: Dataset Characteristicsmentioning
confidence: 99%