Link-Based Clustering Algorithm for Clustering Web Documents

Palanivinayagam, Ashokkumar; Don, S.

doi:10.1520/jte20180497

Cited by 7 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data resampling can cause important instances to be lost forever and often leads to oversampling, a work by [22] focuses on gaining advantages of both data level and the ensemble of classifiers. They apply a few pre-processing steps to the training phase of each classifier and compare them using eight datasets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Imbalanced Classification in Diabetics Using Ensembled Machine Learning

Kumar¹,

Khan²,

Rajendran³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Diabetics is one of the world's most common diseases which are caused by continued high levels of blood sugar. The risk of diabetics can be lowered if the diabetic is found at the early stage. In recent days, several machine learning models were developed to predict the diabetic presence at an early stage. In this paper, we propose an embedded-based machine learning model that combines the split-vote method and instance duplication to leverage an imbalanced dataset called PIMA Indian to increase the prediction of diabetics. The proposed method uses both the concept of over-sampling and under-sampling along with model weighting to increase the performance of classification. Different measures such as Accuracy, Precision, Recall, and F1-Score are used to evaluate the model. The results we obtained using K-Nearest Neighbor (kNN), Naïve Bayes (NB), Support Vector Machines (SVM), Random Forest (RF), Logistic Regression (LR), and Decision Trees (DT) were 89.32%, 91.44%, 95.78%, 89.3%, 81.76%, and 80.38% respectively. The SVM model is more efficient than other models which are 21.38% more than exiting machine learning-based works.

show abstract

Section: Related Workmentioning

confidence: 99%

“…As many methods work on alternating the original dataset [24], a research work proposed by [22] aims to develop a balanced dataset from an imbalanced dataset and perform an ensemble to consolidate the result. This process prevents important data to be lost in the classification.…”

Section: Related Workmentioning

confidence: 99%

Imbalanced Classification in Diabetics Using Ensembled Machine Learning

Kumar¹,

Khan²,

Rajendran³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Good features need to be identified to separate the classes. As the number of features increases, the complexity of the classifier is also increased; this creates a need for better feature selection methods [34].…”

Section: Overall Drawbacks In Existing Feature Selectionmentioning

confidence: 99%

Relevant‐Based Feature Ranking (RBFR) Method for Text Classification Based on Machine Learning Algorithm

Prasad

Kumar

et al. 2022

Journal of Nanomaterials

View full text Add to dashboard Cite

High dimensionality of the feature space is one of the problems in the field of text classification. Identification of optimal subset of features can optimize text classification process in terms of processing time and performance. In this paper, we propose a novel Relevant-Based Feature Ranking (RBFR) algorithm which identifies and selects smaller subsets of more relevant features in the feature space. We compared the performance of the RBFR against other existing feature selection methods such as balanced accuracy measure, information gain, Gini index, and odds ratio on 3 datasets, namely, 20 newsgroup, Reuters, and WAP datasets. We have used 5 machine learning models (SVM, NB, kNN, RF, and LR) to test and evaluate the proposed feature selection method. We found that the performance of the proposed feature selection method is 25.4305% times more effective than the existing feature selection methods in terms of accuracy.

show abstract

“…Let us say, there is a vehicle theft at a place X, that means X has less security for monitoring the crime Y ; hence, the same area or the surrounding areas are too likely to become a vulnerable point. Link-based algorithms such as [21] will be helpful in creating a graph; the latter kNN algorithm can easily predict the crime spots. Figure 4 shows a visualization of crime in San Francisco; the visualization shows which areas are vulnerable and lack security monitoring.…”

Section: Vulnerability Analysismentioning

confidence: 99%

An Optimized Machine Learning and Big Data Approach to Crime Detection

Palanivinayagam

Shankar

Bhattacharya

et al. 2021

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Crime detection is one of the most important research applications in machine learning. Identifying and reducing crime rates is crucial to developing a healthy society. Big Data techniques are applied to collect and analyse data: determine the required features and prime attributes that cause the emergence of crime hotspots. The traditional crime detection and machine learning-based algorithms lack the ability to generate key prime attributes from the crime dataset, hence most often fail to predict crime patterns successfully. This paper is aimed at extracting the prime attributes such as time zones, crime probability, and crime hotspots and performing vulnerability analysis to increase the accuracy of the subject machine learning algorithm. We implemented our proposed methodology using two standard datasets. Results show that the proposed feature generation method increased the performance of machine learning models. The highest accuracy of 97.5% was obtained when the proposed methodology was applied to the Naïve Bayes algorithm while analysing the San Francisco dataset.

show abstract

Link-Based Clustering Algorithm for Clustering Web Documents

Cited by 7 publications

References 23 publications

Imbalanced Classification in Diabetics Using Ensembled Machine Learning

Imbalanced Classification in Diabetics Using Ensembled Machine Learning

Relevant‐Based Feature Ranking (RBFR) Method for Text Classification Based on Machine Learning Algorithm

An Optimized Machine Learning and Big Data Approach to Crime Detection

Contact Info

Product

Resources

About