2018
DOI: 10.1007/s00500-018-3101-5
|View full text |Cite
|
Sign up to set email alerts
|

Wikipedia-based hybrid document representation for textual news classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 39 publications
1
4
0
2
Order By: Relevance
“…• It can also be clearly seen that the performance metrics are higher when enriching the tweets representations with the features extracted from the text of the tweets by using Wikipedia Miner, achieving F1-score improvements up to 13% for the Random Forest algorithm and up to 22.23% for the CART algorithm. This is a clear evidence that the knowledge contained in Wikipedia provides very relevant information to the classifier, thus improving its performance, which is in line with what was stated in previous studies [14]- [16]. • Finally, after the analysis of the results presented, we concluded that the best option for this particular case is the CART algorithm, since it shows performance values similar to Random Forests with significantly lower training and classification times.…”
Section: ) Classifier Results and Analysissupporting
confidence: 89%
See 1 more Smart Citation
“…• It can also be clearly seen that the performance metrics are higher when enriching the tweets representations with the features extracted from the text of the tweets by using Wikipedia Miner, achieving F1-score improvements up to 13% for the Random Forest algorithm and up to 22.23% for the CART algorithm. This is a clear evidence that the knowledge contained in Wikipedia provides very relevant information to the classifier, thus improving its performance, which is in line with what was stated in previous studies [14]- [16]. • Finally, after the analysis of the results presented, we concluded that the best option for this particular case is the CART algorithm, since it shows performance values similar to Random Forests with significantly lower training and classification times.…”
Section: ) Classifier Results and Analysissupporting
confidence: 89%
“…Wikipedia Miner is a general purpose semantic annotator based on natural language processing, machine learning techniques, and the use of Wikipedia as background knowledge. This approach has been successfully applied in previous studies for the classification of, among others, biomedical documents [14], documents of legal nature [15], and news [16]. The main characteristics of Wikipedia Miner are: 1) It identifies concepts that appear in documents, thus avoiding the generation of irrelevant features; 2) it performs word sense disambiguation, thus tackling synonymy and polysemy problems; 3) it links the extracted concepts from documents to Wikipedia entries; and 4) it assigns a weight to each extracted concept according to its relevance in the text.…”
Section: A Document (Tweet) Representationmentioning
confidence: 99%
“…Kroha and Baeza-Yates [67] processed a corpus of news published from 1999 to 2002 in order to explore the impact of several factors (term frequency, grammatical structure, and context) on the actual news classification. Furthermore, Mouriño-Garcia, Pérez-Rodríguez, Anido-Rifón, and Vilares-Ferro [68] proposed a hybrid approach that enriches the traditional BoW representation with background knowledge for the semantic analysis of texts. The results indicated that their concepts-based approach adds information that improves classification performance for news items.…”
Section: A Brief Review Of the Related Literaturementioning
confidence: 99%
“…In addition, online hot news prediction has important application value; firstly, it can enable the government to grasp the trend of public opinion in a timely manner, which is convenient for the government to manage public opinion and grasp and handle sudden public events; secondly, it can help news websites manage the release locations of different news, Put hot news in the area that users pay more attention to, thereby increasing the influence of news websites; at the same time, it promotes the public to pay attention to the current hot news in a timely manner, and triggers thinking about daily life from the news, thereby improving the quality of life. For example, when hot news related to telecommunication fraud occupies the homepage of major fine-textured websites, it can increase people's attention to and beware of telecommunication fraud, and help people learn the relevant knowledge of preventing telecommunication fraud [9][10][11][12]. Network news has become the main source of network waves and public opinion, and it is of great theoretical and applied value to accurately predict hot news and attract public attention and discussion [13][14][15].…”
Section: Introductionmentioning
confidence: 99%