2018
DOI: 10.2478/dim-2018-0004
|View full text |Cite
|
Sign up to set email alerts
|

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Abstract: Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…To score a PubMed article in our scheme, we represent the article as a vector of feature scores, and calculate the similarity (i.e., inverse distance) between the article vector and each of the PT cluster vectors. The closer an article is to a given PT cluster, the more likely it is to belong to that PT [31]. The article citation-based feature scores are appended to the PT similarity scores and these combined feature vectors are fed into a support vector machine learning algorithm to compute the probability of an article for each of the 50 PTs.…”
Section: Representing Each Pt As a Single Vectormentioning
confidence: 99%
“…To score a PubMed article in our scheme, we represent the article as a vector of feature scores, and calculate the similarity (i.e., inverse distance) between the article vector and each of the PT cluster vectors. The closer an article is to a given PT cluster, the more likely it is to belong to that PT [31]. The article citation-based feature scores are appended to the PT similarity scores and these combined feature vectors are fed into a support vector machine learning algorithm to compute the probability of an article for each of the 50 PTs.…”
Section: Representing Each Pt As a Single Vectormentioning
confidence: 99%