Image Classification using Tag and Segmentation based Retrieval

Badghaiya, Shrikant; Bharve, Atul

doi:10.5120/18151-9413

Cited by 4 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Maximize performance and efficiency material, such as pornography, but also because videos are widely spread through social media, inhibiting the application of controlled priori data [33][34][35].…”

Section: Categories and Applications Of Big Data Analyticsmentioning

confidence: 99%

“…Thus, each instance can occur at most once in the sampled data. If the complete dataset has n instances and we need k samples, then it can be shown that with the reservoir sampling algorithm [35], each item in the full dataset has the same probability (i.e., k/n) of being chosen for the sampled dataset. We initially select the first k instances of the dataset for our sample.…”

Section: Samplingmentioning

confidence: 99%

See 1 more Smart Citation

Big Data Analytics: Deep Content-Based Prediction with Sampling Perspective

Albattah¹,

Albahli²

2023

Computer Systems Science and Engineering

View full text Add to dashboard Cite

The world of information technology is more than ever being flooded with huge amounts of data, nearly 2.5 quintillion bytes every day. This large stream of data is called big data, and the amount is increasing each day. This research uses a technique called sampling, which selects a representative subset of the data points, manipulates and analyzes this subset to identify patterns and trends in the larger dataset being examined, and finally, creates models. Sampling uses a small proportion of the original data for analysis and model training, so that it is relatively faster while maintaining data integrity and achieving accurate results. Two deep neural networks, AlexNet and DenseNet, were used in this research to test two sampling techniques, namely sampling with replacement and reservoir sampling. The dataset used for this research was divided into three classes: acceptable, flagged as easy, and flagged as hard. The base models were trained with the whole dataset, whereas the other models were trained on 50% of the original dataset. There were four combinations of model and sampling technique. The F-measure for the AlexNet model was 0.807 while that for the Den-seNet model was 0.808. Combination 1 was the AlexNet model and sampling with replacement, achieving an average F-measure of 0.8852. Combination 3 was the AlexNet model and reservoir sampling. It had an average F-measure of 0.8545. Combination 2 was the DenseNet model and sampling with replacement, achieving an average F-measure of 0.8017. Finally, combination 4 was the DenseNet model and reservoir sampling. It had an average F-measure of 0.8111. Overall, we conclude that both models trained on a sampled dataset gave equal or better results compared to the base models, which used the whole dataset.

show abstract

Section: Categories and Applications Of Big Data Analyticsmentioning

confidence: 99%

Section: Samplingmentioning

confidence: 99%

Big Data Analytics: Deep Content-Based Prediction with Sampling Perspective

Albattah¹,

Albahli²

2023

Computer Systems Science and Engineering

View full text Add to dashboard Cite

show abstract

“…The research works in [33] and [34] offer functions to visual attributes to make multimedia accessible. Content retrieval applications are explored in [35][36][37][38][39][40][41][42]. Feature analysis and reduction based on several related areas are also explored in [19,20,[41][42][43][44][45][46][47][48].…”

Section: Related Workmentioning

confidence: 99%

Feature Selection Techniques for Big Data Analytics

et al. 2022

View full text Add to dashboard Cite

Big data applications have tremendously increased due to technological developments. However, processing such a large amount of data is challenging for machine learning algorithms and computing resources. This study aims to analyze a large amount of data with classical machine learning. The influence of different random sampling techniques on the model performance is investigated by combining the feature selection techniques and machine learning classifiers. The experiments used two feature selection techniques: random subset and random projection. Two machine learning classifiers were also used: Naïve Bayes and Bayesian Network. This study aims to maximize the model performance by reducing the data dimensionality. In the experiments, 400 runs were performed by reducing the data dimensionality of a video dataset that was more than 40 GB. The results show that the overall performance fluctuates between 70% accuracy to 74% for using sampled and non-sample (all the data), a slight difference in performance compared to the non-sampled dataset. With the overall view of the results, the best performance among all combinations of experiments is recorded for combination 3, where the random subset technique and the Bayesian network classifier were used. Except for the round where 10% of the dataset was used, combination 1 has the best performance among all combinations.

show abstract

“…[31,32] use visual functions to access multimedia and filtering. Articles [33][34][35][36] are based on content retrieval.…”

Section: Related Workmentioning

confidence: 99%

Attributes Reduction in Big Data

Albattah

Khan

2020

Applied Sciences

View full text Add to dashboard Cite

Processing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced collections of big data with less computing resources. Therefore, the study analyzed 40 GB data to test various strategies to reduce data processing. Thus, the goal is to reduce this data, but not to compromise on the detection and model learning in machine learning. Several alternatives were analyzed, and it is found that in many cases and types of settings, data can be reduced to some extent without compromising detection efficiency. Tests of 200 attributes showed that with a performance loss of only 4%, more than 80% of the data could be ignored. The results found in the study, thus provide useful insights into large data analytics.

show abstract

Image Classification using Tag and Segmentation based Retrieval

Cited by 4 publications

References 13 publications

Big Data Analytics: Deep Content-Based Prediction with Sampling Perspective

Big Data Analytics: Deep Content-Based Prediction with Sampling Perspective

Feature Selection Techniques for Big Data Analytics

Attributes Reduction in Big Data

Contact Info

Product

Resources

About