2021
DOI: 10.1101/2021.01.08.425887
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Auto-CORPus: A Natural Language Processing Tool for Standardising and Reusing Biomedical Literature

Abstract: MotivationThe availability of improved natural language processing (NLP) algorithms and models enable researchers to analyse larger corpora using open source tools. Text mining of biomedical literature is one area for which NLP has been used in recent years with large untapped potential. However, in order to generate corpora that can be analyzed using machine learning NLP algorithms, these need to be standardized. Summarizing data from literature to be stored into databases typically requires manual curation, … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…The rest were studies that belong to the 'general applicability' category, meaning they were tools or models not designed for specific health categories or diseases. They have general utilities for particular scenarios that might be applicable to a wide range of clinical use cases [50][51][52]54,57,[59][60][61]63,74,77,81,85,[104][105][106][109][110][111][112][113][114][119][120][121][122]124,125,131,136,137 .…”
Section: Literature Review Results On Publicationsmentioning
confidence: 99%
See 2 more Smart Citations
“…The rest were studies that belong to the 'general applicability' category, meaning they were tools or models not designed for specific health categories or diseases. They have general utilities for particular scenarios that might be applicable to a wide range of clinical use cases [50][51][52]54,57,[59][60][61]63,74,77,81,85,[104][105][106][109][110][111][112][113][114][119][120][121][122]124,125,131,136,137 .…”
Section: Literature Review Results On Publicationsmentioning
confidence: 99%
“…This category studies how ontologies and lexicons could be combined with other NLP methods to represent knowledge that can support clinicians. Studies include 33,41,[50][51][52]55,63,74,75,81,85,86,[104][105][106][107][108]110,114,119,131,132 .…”
Section: Literature Review Results On Publicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The broad variability in table structure makes them difficult to mine automatically. We have developed the Automated pipeline for Consistent Outputs from Research Publications (Auto-CORPus) text processing tool that converts publication full-text and tables to standardised machine-interpretable formats that can be analysed by text mining algorithms ( 37 ). In a collaboration with ELIXIR researchers, we are building Auto-CORPus into a text mining workflow to extract GWAS associations from the scientific literature at scale.…”
Section: Discussionmentioning
confidence: 99%