Using of n-grams from morphological tags for fake news classification

Kapusta, Jozef; Drlík, Martin; Munk, Michal

doi:10.7717/peerj-cs.624

Cited by 14 publications

(8 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sophisticated architecture with syntactic features proposed by Gupta et al [2021] presented an increase of up to 3 % in the performance. Kapusta et al [2021] noticed an accuracy increase by 3 to 4 %, where the average accuracies for syntactic, readability, and combined features are 84.12%, 77.67%, and 84.52%, respectively. Nguyen et al [2019] identified that n-grams in combination with dependency sub-trees as features have a positive impact on the performance of the classifier.…”

Section: What Are the Effects Of Incorporating Syntactic Information ...mentioning

confidence: 94%

See 1 more Smart Citation

The Use of Syntactic Information in fake news detection: A Systematic Review

Fagundes,

Roman,

Digiampietri

2024

Reviews

View full text Add to dashboard Cite

Fake news has been a critical problem for society, to the extent that its damaging effects can already be seen in several areas, such as democracy and health. However, as fake news grow in number, manual fact-checking becomes impractical for identifying them, which makes automatic detection a compelling alternative. In this sense, this study gathers multiple solutions for the problem of automatically detecting fake news, through the usage of both lexical and syntactic information. This study consists of a systematic review on fake news detection through linguistic patterns, focusing on the use of syntax to aid in the task. Solving complex problems by capturing linguistic patterns is mostly explored in the Natural Language Processing (NLP) area. In general, the use of shallow syntax representations, such as Parts of speech, only marginally increases the performance of classifiers in this task. However, relying on deeper syntactic representations, such as context-free grammars or syntactic dependency trees, present more promising results.

show abstract

Section: What Are the Effects Of Incorporating Syntactic Information ...mentioning

confidence: 94%

“…Zhou et al [2020] saw no improvement with shallow syntax, but deep syntax-level features (CFGs) and features at lexicon-level (BOWs) outperform the others. Kapusta et al [2021] concluded that morphological analysis can be applied to fake news classification.…”

Section: What Are the Effects Of Incorporating Syntactic Information ...mentioning

confidence: 99%

The Use of Syntactic Information in fake news detection: A Systematic Review

Fagundes,

Roman,

Digiampietri

2024

Reviews

View full text Add to dashboard Cite

show abstract

“…The occurrence frequency of all grams is counted and filtered according to the preset threshold to form a list of key grams. N-gram language model shows good performance in many text mining tasks (30)(31)(32). For example, Giannakopoulos and Karkaletsis (30) expressed the text as an n-gram model using a sliding window with a length of n by connecting the adjacent n-grams with the edges representing their co-occurrence frequency in a given text window, they captured the word order in the text and detected some similarities in the text morphology.…”

Section: Comparison With Prior Workmentioning

confidence: 99%

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

Liu

Zhang

et al. 2022

Front. Oncol.

View full text Add to dashboard Cite

BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.

show abstract

“…This accuracy was lower than that obtained using word n ‐grams (96.8%) and higher than that obtained using character n ‐grams (87.1%). Zafarani et al (2019) and Kapusta et al (2021) used parts of POS tag n ‐grams as morphological characteristics of words to detect fake news. POS tag n ‐grams can also be used to predict author personality (Litvinova et al, 2015).…”

Section: Basic Feature Metricsmentioning

confidence: 99%

A review on authorship attribution in text mining

Zheng

Jin

2022

WIREs Computational Stats

View full text Add to dashboard Cite

The issue of authorship attribution has long been considered and continues to be a popular topic. Because of advances in digital computers, this field has experienced rapid developments in the last decade. In this article, a survey of recent advances in authorship attribution in text mining is presented. This survey focuses on authorship attribution methods that are statistically or computationally supported as opposed to traditional literary approaches. The main aspects covered include the changes in research topics over time, basic feature metrics, machine learning techniques, and the advantages and disadvantages of each approach. Moreover, the corpus size, number of candidates, data imbalance, and result description, all of which pose challenges in authorship attribution, are discussed to inform future work.This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Text Mining

show abstract

Using of n-grams from morphological tags for fake news classification

Cited by 14 publications

References 27 publications

The Use of Syntactic Information in fake news detection: A Systematic Review

The Use of Syntactic Information in fake news detection: A Systematic Review

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

A review on authorship attribution in text mining

Contact Info

Product

Resources

About