With the increasing storage of images worldwide, automatic image annotation has become a very active and relevant research area, however, it still lacks a benchmark specifically designed for this task, and in particular for region-level annotation. In this report we introduce the segmented and annotated IAPR-TC12 benchmark, an extended resource for the evaluation of automatic image annotation (AIA) methods. We present a methodology for the manual segmentation and annotation of the images in this collection. The goal of this methodology is to obtain reliable ground truth data for benchmarking AIA and related tasks. For annotation, an ad-hoc vocabulary is defined and hierarchically organized. This hierarchy proved to be very useful for obtaining objective and structured annotations. Also, a soft measure for the evaluation of annotation performance is proposed, based on this hierarchy. Statistics on the segmentation and annotation processes give evidence of the reliability of the proposed approach. Visual attributes and spatial relations are also extracted from regions in segmented images. The latter feature will promote research on the use of (spatial) contextual information for AIA and image retrieval. The extended collection is publicly available and can be used to evaluate a variety of tasks besides image annotation; this resource can also serve to study the use of automatic annotations for multimedia image retrieval; the latter is a distinctive feature of the collection because, although there are several image annotation benchmarks, there is currently no collection that can be used to effectively evaluate the performance of annotation methods in the task they are designed for (i.e. image retrieval). We outline several applications and raise important questions that might be answered with the annotated collection; motivating research in the areas of image segmentation, annotation and retrieval as well as on machine learning.
When working with real-world applications we often find imbalanced datasets, those for which there exists a majority class with normal data and a minority class with abnormal or important data. In this work, we make an overview of the class imbalance problem; we review consequences, possible causes and existing strategies to cope with the inconveniences associated to this problem. As an effort to contribute to the solution of this problem, we propose a new rule induction algorithm named Rule Extraction for MEdical Diagnosis (REMED), as a symbolic one-class learning approach. For the evaluation of the proposed method, we use different medical diagnosis datasets taking into account quantitative metrics, comprehensibility, and reliability. We performed a comparison of REMED versus C4.5 and RIPPER combined with over-sampling and cost-sensitive strategies. This empirical analysis of the REMED algorithm showed it to be quantitatively competitive with C4.5 and RIPPER in terms of the area under the Receiver Operating Characteristic curve (AUC) and the geometric mean, but overcame them in terms of comprehensibility and reliability. Results of our experiments show that REMED generated rules systems with a larger degree of abstraction and patterns closer to well-known abnormal values associated to each considered medical dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.