Crowdsourcing provides a means of gathering data from the public in order to infer what the ground truth label of an unfamiliar entity is. Such data are not used for decision making in their raw form until further processing is done to infer ground truth from the crowdsourced data. This paper presents a detailed comparative analysis of the ground truth inference ability of three clustering algorithms on crowd sourced datasets with different experimental scenarios (Initializing centroids and extracting class labels). The algorithms include, the self-organizing maps, the k-means and the expectation maximization clustering algorithm. The three algorithms were experimented on different datasets. The datasets used are Adult2, weather sentiments, emotion, valence5 and employee review dataset Four possible experimental scenarios for inferring the ground truth label from the curated dataset were analysed. The first scenario makes use of the clustering algorithm alone relying on the inner workings of the algorithm to predict the ground truth, while the second scenario makes use of an extract class label mechanism where the ground truth label was inferred by performing a further analysis on the clusters provided by the algorithm. In the third scenario, the centroids of the clustering algorithm were pre-initialized by setting the maximum value in each class from the curated data as a centroid, where centroid might mean something different relative to the particular algorithm. The fourth experimental scenario is a combination of the second and third scenario. Experimental results show that the self-organizing map (SOM) performs best across all the datasets when the weights of the units in the SOM are pre-initialized. SOM had the best performance on the weather sentiments dataset recording 92.49% accuracy and ROC AUC score of 0.88. It also recorded the best overall average accuracy of 50.2% and ROC AUC score of 0.59365 across all the datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.