Noor Fazilla Abd Yusof scite author profile

Noor Fazilla Abd Yusof

5Publications

10Citation Statements Received

52Citation Statements Given

How they've been cited

How they cite others

Affiliations

Technical University of Malaysia Malacca, University of Aberdeen

Publications

Order By: Most citations

Split Over-Training for Unsupervised Purchase Intention Identification

Yusof¹

2020

IJATCSE

View full text Add to dashboard Cite

Recognizing user-expressed intentions in social media can be useful for many applications such as business intelligence, as intentions are intimately linked to potential actions or behaviors. This paper focuses on a binary classification problem: whether a text expresses purchase intention (PI) or not (non-PI). In contrast to existing research, which relies on labeled intention corpus or linguistic knowledge, we proposed an unsupervised method called split over-training for the PI identification task. Experiments on PI identification from tweets showed that our approach was effective and promising. The best classifying accuracy of 84.6% and PI F-measure of 70.4% was achieved, which are only 7.7% and 4.9% respectively lower than fully supervised models. This means our unsupervised method may provide reasonable preprocessing for intention corpus labeling or intention knowledge acquisition.

show abstract

Sentiment Analysis in Social Media

Yusof¹,

Lin²,

He³

2018

View full text Add to dashboard Cite

Cross-domain sentiment analysis model on Indonesian YouTube comment

Aribowo

Basiron

Yusof

et al. 2021

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of non-ensemble machine learning, including Naïve Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.

show abstract

Performance of Content-Based Features to Detect Depression Tendencies in Different Text Lengths

Zulkarnain

Yusof

Ahmad

et al. 2022

View full text Add to dashboard Cite

Semi-supervised learning for sentiment classification with ensemble multi-classifier approach

Aribowo

Basiron²,

Yusof³

2022

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

Supervised sentiment analysis ideally uses a fully labeled data set for modeling. However, this ideal condition requires a struggle in the label annotation process. Semi-supervised learning (SSL) has emerged as a promising method to avoid time-consuming and expensive data labeling without reducing model performance. However, the research on SSL is still limited and its performance needs to be improved. Thus, this study aims to create a new SSL-Model for sentiment analysis. The Ensemble Classifier SSL model for sentiment classification is introduced. The research went through pre-processing, vectorization, and feature extraction using TF-IDF and n-grams. Support Vector Machine (SVM) or Random Forest for tokenization was used to separate unigram, bigram, and trigram in model generation. Then, the outputs of these models were combined using stacking ensemble approach. Accuracy and F1-score were used for the evaluation. IMDB datasets and US Airlines were used to test the new SSL models. The conclusion is that the sentiment annotation accuracy is highly dependent on the suitability of the dataset with the machine learning algorithm. In IMDB dataset, which consists of two classes, it is better to use SVM. In the US Airlines consisting of three classes, SVM is better at improving the model performance against the baseline, but RF is better at achieving the baseline performance even though it fails to maintain the model performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.