Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023
DOI: 10.1145/3539618.3591897
|View full text |Cite
|
Sign up to set email alerts
|

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis

David Kartchner,
Irfan Al-Hussaini,
Haydn Turner
et al.

Abstract: This work presents a new, original document classification dataset, , to expedite the initial selection and labeling of studies for drug repurposing. The dataset consists of 10,000 human-annotated abstracts from scientific articles in PubMed. Each abstract is labeled with up to eight attributes necessary to perform meta-analysis utilizing the popular patient-intervention-comparator-outcome (PICO) method: has human subjects, is clinical trial/cohort, has population size, has target disease, has study… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 36 publications
0
2
0
Order By: Relevance
“… Use of transfer learning—Transfer learning, which applies knowledge gained from a larger distribution or dataset to a smaller one, could be added as a module to the proposed general framework for specific research use cases [ 80 ] However, in general, transfer learning would not be as generally suited to all rare disease, particularly heterogeneous rare diseases, because their sample distributions may not be well represented by the larger aggregate or average model distribution. Use of large language models—Large language models like ChatGPT may enable the aggregation and extraction of multiple published rare disease datasets in order to increase the available sample sizes for standard collected features [ 81 ]. While large language models excel in producing tabular data from unstructured data, most are currently not specifically suited for the generation of predictions using small-sample-size tabular data.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“… Use of transfer learning—Transfer learning, which applies knowledge gained from a larger distribution or dataset to a smaller one, could be added as a module to the proposed general framework for specific research use cases [ 80 ] However, in general, transfer learning would not be as generally suited to all rare disease, particularly heterogeneous rare diseases, because their sample distributions may not be well represented by the larger aggregate or average model distribution. Use of large language models—Large language models like ChatGPT may enable the aggregation and extraction of multiple published rare disease datasets in order to increase the available sample sizes for standard collected features [ 81 ]. While large language models excel in producing tabular data from unstructured data, most are currently not specifically suited for the generation of predictions using small-sample-size tabular data.…”
Section: Resultsmentioning
confidence: 99%
“…Use of large language models—Large language models like ChatGPT may enable the aggregation and extraction of multiple published rare disease datasets in order to increase the available sample sizes for standard collected features [ 81 ]. While large language models excel in producing tabular data from unstructured data, most are currently not specifically suited for the generation of predictions using small-sample-size tabular data.…”
Section: Resultsmentioning
confidence: 99%