2016
DOI: 10.1186/s13326-016-0073-1
|View full text |Cite
|
Sign up to set email alerts
|

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Abstract: BackgroundWith the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue.MethodsWe propose two classification methods:… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(11 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…They reported that the proposed methodology indicated reasonable classification performance. Drame et al [28] proposed k -nearest neighbours (kNN) and an explicit semantic analysis based approach for large-scale biomedical document classification on a subset of MEDLINE documents. They stated that the kNN-based method with the RF learning algorithm achieved good performances compared with the current state-of-the-art methods.…”
Section: Related Workmentioning
confidence: 99%
“…They reported that the proposed methodology indicated reasonable classification performance. Drame et al [28] proposed k -nearest neighbours (kNN) and an explicit semantic analysis based approach for large-scale biomedical document classification on a subset of MEDLINE documents. They stated that the kNN-based method with the RF learning algorithm achieved good performances compared with the current state-of-the-art methods.…”
Section: Related Workmentioning
confidence: 99%
“…Recall value is more relevant in this case since it shows the proportion of the correct annotations that an approach was able to discover. The choice of concepts for annotating documents is quite subjective and attaining high recall values remain a challenge [8].…”
Section: Resultsmentioning
confidence: 99%
“…One variant of KNN ranks candidate concepts by combining the relevance scores of documents for which they form annotations [6,7]. Another variant passes the features of candidate concepts to a machine classifier which determines which concepts to put forward for annotation [8]. Some features that are used by a classifier include the proportion of retrieved documents that were annotated with the concept and if the concept appears in the title or content of a document.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, WikiRelate [7] was the first work which compute the measures of semantic relatedness using Wikipedia, this approach applied the familiar technique used in semantic relatedness based on wordnet and modified it to be used in Wikipedia, such as path-length measure [8], but in general the results are similar. However, Gabrilovich and Markovitch (2007) [5] propose a new approach with Explicit Semantic Analysis (ESA) that achieve highly accurate results, this method has been extensively studied in many applications [9]. ESA use Wikipedia as a semantic interpreter and builds a weighted inverted vector that maps each term into a list of Wikipedia articles in which it appears, and computes the similarity between vectors generated from two terms or texts.…”
Section: Introductionmentioning
confidence: 99%