2019
DOI: 10.12928/telkomnika.v17i6.11711
|View full text |Cite
|
Sign up to set email alerts
|

An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

Abstract: On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adapti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(22 citation statements)
references
References 34 publications
0
21
0
1
Order By: Relevance
“…It involves the use of FS algorithms to filter out irrelevant and redundant data features from the original dataset to prevent over-fitting [6,13] and improve the classification accuracy of the model. Feature selection also reduces the classification models' complexity in time and space domains [14][15][16][17][18]. The main idea of this paper is to employ the TLBO-based algorithm for features subset selection in BC diagnosis.…”
Section: Telkomnika Telecommun Comput El Controlmentioning
confidence: 99%
“…It involves the use of FS algorithms to filter out irrelevant and redundant data features from the original dataset to prevent over-fitting [6,13] and improve the classification accuracy of the model. Feature selection also reduces the classification models' complexity in time and space domains [14][15][16][17][18]. The main idea of this paper is to employ the TLBO-based algorithm for features subset selection in BC diagnosis.…”
Section: Telkomnika Telecommun Comput El Controlmentioning
confidence: 99%
“…Hence, a hybrid adaptive approach called Hoeffding Naive Bayes Tree (hnbt) which performs better than the component prediction methods for both complex and simple concepts has been proposed. This concept of this method based on executing a naive Bayes prediction on each training feature, then, comparing the prediction performance with the majority class [19][20][21][22][23][24][25]. The number of times the naïve Bayes makes a correct prediction of the true class is noted (by taking counts) compared to the majority class.…”
Section: Hoeffding Tree (Ht)mentioning
confidence: 99%
“…Since the conventional computing techniques could not provide the expected result and efficiency to manage big data. The different distributed frameworks like hadoop [4], spark [5], and storm [6] have been introduced to satisfy the prerequisite of taking care of the big data.…”
Section: Introductionmentioning
confidence: 99%