A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

Sharef, Nurfadhlina Mohd; Martin, Trevor; Kasmiran, Khairul Azhar; Sulaiman, Md. Nasir; Murad, Masrah Azrifah Azmi

doi:10.1007/s00500-014-1358-x

Cited by 4 publications

(2 citation statements)

References 71 publications

(77 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For demonstration purposes, this overview will consider the domain-based classification at the user level. LR (Al-Tahrawi, 2015;Yen et al, 2011), decision tree (Sharef et al, 2015) and SVM (Altınel et al, 2015;Dong et al, 2016) in particular have been used for text categorisations. Also these approaches are more narrow and computationally simpler than recently developed machine learning approaches, such as the deep learning or deep networks approaches.…”

Section: Machine Learning Module For Classificationmentioning

confidence: 99%

Twitter mining for ontology-based domain discovery incorporating machine learning

Abu-Salih

Wongthongtham

Chan

2018

JKM

View full text Add to dashboard Cite

Purpose This paper aims to obtain the domain of the textual content generated by users of online social network (OSN) platforms. Understanding a users’ domain (s) of interest is a significant step towards addressing their domain-based trustworthiness through an accurate understanding of their content in their OSNs. Design/methodology/approach This study uses a Twitter mining approach for domain-based classification of users and their textual content. The proposed approach incorporates machine learning modules. The approach comprises two analysis phases: the time-aware semantic analysis of users’ historical content incorporating five commonly used machine learning classifiers. This framework classifies users into two main categories: politics-related and non-politics-related categories. In the second stage, the likelihood predictions obtained in the first phase will be used to predict the domain of future users’ tweets. Findings Experiments have been conducted to validate the mechanism proposed in the study framework, further supported by the excellent performance of the harnessed evaluation metrics. The experiments conducted verify the applicability of the framework to an effective domain-based classification for Twitter users and their content, as evident in the outstanding results of several performance evaluation metrics. Research limitations/implications This study is limited to an on/off domain classification for content of OSNs. Hence, we have selected a politics domain because of Twitter’s popularity as an opulent source of political deliberations. Such data abundance facilitates data aggregation and improves the results of the data analysis. Furthermore, the currently implemented machine learning approaches assume that uncertainty and incompleteness do not affect the accuracy of the Twitter classification. In fact, data uncertainty and incompleteness may exist. In the future, the authors will formulate the data uncertainty and incompleteness into fuzzy numbers which can be used to address imprecise, uncertain and vague data. Practical implications This study proposes a practical framework comprising significant implications for a variety of business-related applications, such as the voice of customer/voice of market, recommendation systems, the discovery of domain-based influencers and opinion mining through tracking and simulation. In particular, the factual grasp of the domains of interest extracted at the user level or post level enhances the customer-to-business engagement. This contributes to an accurate analysis of customer reviews and opinions to improve brand loyalty, customer service, etc. Originality/value This paper fills a gap in the existing literature by presenting a consolidated framework for Twitter mining that aims to uncover the deficiency of the current state-of-the-art approaches to topic distillation and domain discovery. The overall approach is promising in the fortification of Twitter mining towards a better understanding of users’ domains of interest.

show abstract

Section: Machine Learning Module For Classificationmentioning

confidence: 99%

Twitter mining for ontology-based domain discovery incorporating machine learning

Abu-Salih

Wongthongtham

Chan

2018

JKM

View full text Add to dashboard Cite

show abstract

“…In cross-lingual text detection and text checking, sentence similarity calculation is the core criterion that determines the accuracy of cross-lingual text detection and checking [4]; in topic tracking and detection, cross-language sentence similarity can help determine where a topic first appeared on the Internet [5]- [8]. Therefore, cross-lingual sentence similarity is an important study, and its calculation efficiency and accuracy can affect the operation efficiency of many related systems.…”

Section: Introductionmentioning

confidence: 99%

A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

et al. 2022

View full text Add to dashboard Cite

Cross-language sentence similarity computation is among the focuses of research in natural language processing (NLP). At present, some researchers have introduced fine-grained word and character features to help models understand sentence meanings, but they do not consider coarse-grained prior knowledge at the sentence level. Even if two cross-linguistic sentence pairs have the same meaning, the sentence representations extracted by the baseline approach may have language-specific biases. Considering the above problems, in this paper, we construct a Chinese-Uyghur cross-lingual sentence similarity dataset and propose a method to compute cross-lingual sentence similarity by fusing multiple features. The method is based on the cross-lingual pretraining model XLM-RoBERTa and assists the model in similarity calculation by introducing two coarse-grained prior knowledge features, i.e., sentence sentiment and length features. At the same time, to eliminate possible language-specific biases in the vectors, we whitened the sentence vectors of different languages to ensure that they were all represented under the standard orthogonal basis. Considering that the combination of different vectors has different effects on the final performance of the model, we introduce different vector features for comparison experiments based on the basic feature splicing method. The results show that the absolute value feature of the difference between two vectors can reflect the similarity of two sentences well. The final F1 value of our method reaches 98.97%, which is 19.81% higher than that of the baseline.

show abstract

Automated compliance checking in the context of Industry 4.0: from a systematic review to an empirical fuzzy multi-criteria approach

et al. 2021

View full text Add to dashboard Cite

A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

Cited by 4 publications

References 71 publications

Twitter mining for ontology-based domain discovery incorporating machine learning

Twitter mining for ontology-based domain discovery incorporating machine learning

A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

Automated compliance checking in the context of Industry 4.0: from a systematic review to an empirical fuzzy multi-criteria approach

Contact Info

Product

Resources

About