Recent outstanding results of supervised object detection in competitions and challenges are often associated with specific metrics and datasets. The evaluation of such methods applied in different contexts have increased the demand for annotated datasets. Annotation tools represent the location and size of objects in distinct formats, leading to a lack of consensus on the representation. Such a scenario often complicates the comparison of object detection methods. This work alleviates this problem along the following lines: (i) It provides an overview of the most relevant evaluation methods used in object detection competitions, highlighting their peculiarities, differences, and advantages; (ii) it examines the most used annotation formats, showing how different implementations may influence the assessment results; and (iii) it provides a novel open-source toolkit supporting different annotation formats and 15 performance metrics, making it easy for researchers to evaluate the performance of their detection algorithms in most known datasets. In addition, this work proposes a new metric, also included in the toolkit, for evaluating object detection in videos that is based on the spatio-temporal overlap between the ground-truth and detected bounding boxes.
This paper tackles multichannel separation of convolutive mixtures of audio sources by using complex-valued nonnegative matrix factorization (CNMF). We extend models proposed by previous works and show that one may tailor advanced single-channel NMF techniques, such as the deconvolutive NMF, to the multichannel factorization scheme. Additionally, we propose a regularized cost function that enables the user to control the distribution of the estimated parameters without significantly increasing the underlying computational cost. We also develop an optimization framework compatible with previous related works. Our simulations show that the proposed deconvolutive model offers advantages when compared to the simple NMF, and that the regularization is able to steer the parameters towards a solution with desirable properties.
This paper addresses the separation of audio sources from convolutive mixtures captured by a microphone array. We approach the problem using complex-valued non-negative matrix factorization (CNMF), and extend previous works by tailoring advanced (single-channel) NMF models, such as the deconvolutive NMF, to the multichannel factorization setup. Further, a sparsitypromoting scheme is proposed so that the underlying estimated parameters better fit the time-frequency properties inherent in some audio sources. The proposed parameter estimation framework is compatible with previous related works, and can be thought of as a step toward a more general method. We evaluate the resulting separation accuracy using a simulated acoustic scenario, and the tests confirm that the proposed algorithm provides superior separation quality when compared to a stateof-the-art benchmark. Finally, an analysis of the effects of the introduced regularization term shows that the solution is in fact steered toward a sparser representation.
Pixel-level ground truth masks for object detection databases are extremely useful in the context of machine learning, specially for convolutional neural network applications. However, the manual labeling process of such data demands a lot of effort and time, especially in videos, in which the labeling needs to be performed in each frame. Therefore, only bounding box annotations, that are much faster to perform, are present in most databases. In this work we propose a semi-automated approach to transform bounding-box annotations into silhouette annotations with a reduced processing time. We compute features of a siamese network in the region inside a bounding box and obtain the probability of a pixel to belong to the foreground, which is then refined by a post-processing step. We employ our methodology to the VDAO dataset, creating a new annotation that contains the silhouette of the objects. We estimate that our method results in a reduction of the annotation time by 90% in average, while providing an accurate silhouette for the objects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.