The creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how different sets of annotations influence the ranking of systems and if these annotations can be obtained with a crowdsourcing approach. This study is applied to annotations of images with multiple concepts. A subset of the images employed in the latest ImageCLEF Photo Annotation competition was manually annotated by expert annotators and non-experts with Mechanical Turk. The inter-annotator agreement is computed at an image-based and concept-based level using majority vote, accuracy and kappa statistics. Further, the Kendall τ and Kolmogorov-Smirnov correlation test is used to compare the ranking of systems regarding different ground-truths and different evaluation measures in a benchmark scenario. Results show that while the agreement between experts and non-experts varies depending on the measure used, its influence on the ranked lists of the systems is rather small. To sum up, the majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent. The resulting annotation set is of comparable quality to the annotations of experts.
Supervised learning requires adequately labeled training data. In this paper, we present an approach for automatic detection of outliers in image training sets using an one-class support vector machine (SVM). The image sets were downloaded from photo communities solely based on their tags. We conducted four experiments to investigate if the one-class SVM can automatically differentiate between target and outlier images. As testing setup, we chose four image categories, namely Snow & Skiing, Family & Friends, Architecture & Buildings and Beach. Our experiments show that for all tests a significant tendency to remove the outliers and retain the target images is present. This offers a great possibility to gather big data sets from the Web without the need for a manual review of the images
In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in [22]. The final objective is to assess the benefit of integrating semantic distances to the OS measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall ? and Kolmogorov-Smirnov correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability
Mood or emotion information are often used search terms or navigation properties within multimedia archives, retrieval systems or multimedia players. Most of these applications engage end-users or experts to tag multimedia objects with mood annotations. Within the scientific community different approaches for content-based music, photo or multimodal mood classification can be found with a wide range of used mood definitions or models and completely different test suites. The purpose of this paper is to review common mood models in order to assess their flexibility, to present a generic multi-modal mood classification framework which uses various audio-visual features and multiple classifiers and to present a novel music and photo mood classification reference set for evaluation. The classification framework is the basis for different applications e.g. automatic media tagging or music slideshow players. The novel reference set can be used for comparison of different algorithms from various research groups. Finally, the results of the introduced framework are presented, discussed and conclusions for future steps are drawn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.