Fast Object Localization via Sensitivity Analysis

Ebrahimpour, Mohammad K.; Noelle, David C.

doi:10.1007/978-3-030-33723-0_17

Cited by 3 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Attention-Based Object Detection. Attention-based object detection methods depend on a set of training images with associated class labels but without any object location bounding box annotations [16], [21]. The lack of a need for ground-truth bounding boxes is a substantial benefit of this approach, since manually obtaining such information is costly.…”

Section: Related Workmentioning

confidence: 99%

“…Image classification is only one of the core problems of computer vision, however. Beyond object recognition [2]- [4], there are applications for such capabilities as semantic segmentation [5]- [7], image captioning [8]- [11], and object detection [12]- [16]. The last of these involves locating and classifying all of the relevant objects in an image.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

WW-Nets: Dual Neural Networks for Object Detection

Ebrahimpour¹,

Falandays²,

Spevack³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a new deep convolutional neural network framework that uses object location knowledge implicit in network connection weights to guide selective attention in object detection tasks. Our approach is called What-Where Nets (WW-Nets), and it is inspired by the structure of human visual pathways. In the brain, vision incorporates two separate streams, one in the temporal lobe and the other in the parietal lobe, called the ventral stream and the dorsal stream, respectively. The ventral pathway from primary visual cortex is dominated by "what" information, while the dorsal pathway is dominated by "where" information. Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network". The aim of the What Network is to provide selective attention to the relevant parts of the input image. The Where Network uses this information to locate and classify objects of interest. In this paper, we compare this approach to state-of-the-art algorithms on the PASCAL VOC 2007 and 2012 and COCO object detection challenge datasets. Also, we compare out approach to human "ground-truth" attention. We report the results of an eye-tracking experiment on human subjects using images from PASCAL VOC 2007, and we demonstrate interesting relationships between human overt attention and information processing in our WW-Nets. Finally, we provide evidence that our proposed method performs favorably in comparison to other object detection approaches, often by a large margin. The code and the eye-tracking ground-truth dataset can be found at: https://github.com/mkebrahimpour.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

WW-Nets: Dual Neural Networks for Object Detection

Ebrahimpour¹,

Falandays²,

Spevack³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, convolutional neural networks (CNNs) have proven effective in learning to classify large sets of categories when given very large numbers of training examples [12,13,14,15]. One of the advantages of deep CNNs in sound classification is their ability to learn useful features in an end-toend manner by mapping raw data, such as raw waveform audio, onto class labels.…”

Section: Introductionmentioning

confidence: 99%

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

Ebrahimpour,

Schneider,

Noelle

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Acoustic analyses of infant vocalizations are valuable for research on speech development as well as applications in sound classification. Previous studies have focused on measures of acoustic features based on theories of speech processing, such spectral and cepstrum-based analyses. More recently, end-toend models of deep learning have been developed to take raw speech signals (acoustic waveforms) as inputs and convolutional neural network layers to learn representations of speech sounds based on classification tasks. We applied a recent endto-end model of sound classification to analyze a large-scale database of labeled infant and adult vocalizations recorded in natural settings outside the lab with no control over recording conditions. The model learned basic classifications like infant versus adult vocalizations, infant speech-related versus nonspeech vocalizations, and canonical versus non-canonical babbling. The model was trained on recordings of infants ranging from 3 to 18 months of age, and classification accuracy changed with age as speech became more distinct and babbling became more speech-like. Further work is needed to validate and explore the model and dataset, but our results show how deep learning can be used to measure and investigate speech acquisition and development, with potential applications in speech pathology and infant monitoring.

show abstract