2022
DOI: 10.1148/ryai.220007
|View full text |Cite
|
Sign up to set email alerts
|

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…However, performance of these models is often lower than what would be required clinically without additional feature engineering 13,15 or fine-tuning on thousands of manually-derived labels 14,16 specific to the task. This likely reflects the fact that medical report text has specific structure and meaning while comprising only a small proportion of the general language used to train these models.…”
Section: Many Published Methods For Extraction Of Multiple Values Use...mentioning
confidence: 99%
See 1 more Smart Citation
“…However, performance of these models is often lower than what would be required clinically without additional feature engineering 13,15 or fine-tuning on thousands of manually-derived labels 14,16 specific to the task. This likely reflects the fact that medical report text has specific structure and meaning while comprising only a small proportion of the general language used to train these models.…”
Section: Many Published Methods For Extraction Of Multiple Values Use...mentioning
confidence: 99%
“…Machine learning has been used on clinical reports but not specifically for extraction of concepts from echocardiogram reports. Many have used various implementations of BERT (Bidirectional Encoder Representations from Transformers), an early large language model (LLM), to extract radiographic clinical findings 13 , mentions of devices 14 , study characteristics 15 , and result keywords 16 from radiology or pathology reports.…”
Section: Introductionmentioning
confidence: 99%
“…A recent study has found that further pre‐training BERT on relevant, task‐specific content improves performance on classification tasks (Tejani et al., 2022). Therefore, we selected two further pre‐trained BERT models, Conversational BERT (Burtsev et al., 2018) and Bio‐clinical BERT (CliBERT) (Alsentzer et al., 2019).…”
Section: Methodsmentioning
confidence: 99%
“…The aim of these embedders is not to perform well on a single given property prediction task, but rather solely to produce a rich initial embedding of a molecule in latent space [ 88 ] and fully capture a lossless representation of the molecule. To this end, models are commonly trained in an unsupervised manner [ 89 ]. Unsupervised training involves unlabeled data; molecules without any specific property or endpoint being considered.…”
Section: Learned Representationsmentioning
confidence: 99%
“…Unsupervised training involves unlabeled data; molecules without any specific property or endpoint being considered. These models then have much more data available to them and if trained carefully, are more likely to converge on good parameters, enabling them to produce powerful, generalizable embeddings of new input molecules [ 89 ].…”
Section: Learned Representationsmentioning
confidence: 99%