2018
DOI: 10.1016/j.artmed.2018.04.007
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Wikipedia knowledge to classify multilingual biomedical documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 19 publications
1
3
0
Order By: Relevance
“…• It can also be clearly seen that the performance metrics are higher when enriching the tweets representations with the features extracted from the text of the tweets by using Wikipedia Miner, achieving F1-score improvements up to 13% for the Random Forest algorithm and up to 22.23% for the CART algorithm. This is a clear evidence that the knowledge contained in Wikipedia provides very relevant information to the classifier, thus improving its performance, which is in line with what was stated in previous studies [14]- [16]. • Finally, after the analysis of the results presented, we concluded that the best option for this particular case is the CART algorithm, since it shows performance values similar to Random Forests with significantly lower training and classification times.…”
Section: ) Classifier Results and Analysissupporting
confidence: 89%
See 1 more Smart Citation
“…• It can also be clearly seen that the performance metrics are higher when enriching the tweets representations with the features extracted from the text of the tweets by using Wikipedia Miner, achieving F1-score improvements up to 13% for the Random Forest algorithm and up to 22.23% for the CART algorithm. This is a clear evidence that the knowledge contained in Wikipedia provides very relevant information to the classifier, thus improving its performance, which is in line with what was stated in previous studies [14]- [16]. • Finally, after the analysis of the results presented, we concluded that the best option for this particular case is the CART algorithm, since it shows performance values similar to Random Forests with significantly lower training and classification times.…”
Section: ) Classifier Results and Analysissupporting
confidence: 89%
“…Wikipedia Miner is a general purpose semantic annotator based on natural language processing, machine learning techniques, and the use of Wikipedia as background knowledge. This approach has been successfully applied in previous studies for the classification of, among others, biomedical documents [14], documents of legal nature [15], and news [16]. The main characteristics of Wikipedia Miner are: 1) It identifies concepts that appear in documents, thus avoiding the generation of irrelevant features; 2) it performs word sense disambiguation, thus tackling synonymy and polysemy problems; 3) it links the extracted concepts from documents to Wikipedia entries; and 4) it assigns a weight to each extracted concept according to its relevance in the text.…”
Section: A Document (Tweet) Representationmentioning
confidence: 99%
“…The Wikimedia Foundation maintains a large set of multilingual data in the form of articles across multiple Wikipedia projects. Garcia et al, previously used the interlanguage links of Wikipedia to apply concept mapping in their efforts to develop a classifier for multilingual biomedical documents [18].…”
Section: Discussionmentioning
confidence: 99%
“…Wikipedia and Wikipedia Miner have been used in many fields such as automatic topic indexing [14], document clustering [15], document summarization [16], the classification of multilingual biomedical documents [17], converting concept-based representations of documents from one language to another [18], identifying the prerequisite relationships among learning objects [19], classifying news articles [20], evaluating and classifying Open Educational Resources (OERs) and OpenCourseware (OCW) based on quality criteria [21], and for group recommendation by combining topic identification and social networks [22].…”
Section: Literature Reviewmentioning
confidence: 99%