Learned features versus engineered features for semantic video indexing

Budnik, Mateusz; Gomez, Efrain Leonardo Gutierrez; Safadi, Bahjat; Quénot, Georges

doi:10.1109/cbmi.2015.7153637

Cited by 8 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though learned features (CNN) could obtain better results than ours when large datasets are available, engineered features can be still useful when few training samples are available as proved in [57]. Moreover, engineered features like ours are still useful because they can improve the performance when combined with CNNs features, as suggested in [58].…”

Section: Comparison With Other Action Recognition Techniquesmentioning

confidence: 95%

“…In recent years, most of authors combine their CNNs techniques with IDTs features to improve the results. As suggested in [58], engineered features are still useful because they can improve the performance when combined with CNNs features. In the future we will test if our features are complementary with CNNs features to improve the state-of-the-art results.…”

Section: Further Workmentioning

confidence: 99%

See 1 more Smart Citation

Human action recognition by means of subtensor projections and dense trajectories

Carmona¹,

Climent²

2018

Pattern Recognition

View full text Add to dashboard Cite

In last years, most human action recognition works have used dense trajectories features, to achieve state-of-the-art results. Histograms of Oriented Gradients (HOG), Histogram of Optical Flow (HOF) and Motion Boundary Histograms (MBH) features are extracted from regions and being tracked across the frames. The goal of this paper is to improve the performance obtained by means of Improved Dense Trajectories (IDTs), adding new features based on temporal templates. We construct these templates considering a video sequence as a third-order tensor and computing three different projections. We use several functions for projecting the fibers from the video sequences, and combined them by means of sum pooling. As a first contribution of our work, we present in detail the method based on tensor projections. First, we have assessed the results obtained using only template based action recognition. Next, in order to achieve state-of-art recognition rates, we have fused our features with those of IDTs.This is the second contribution of the article. Experiments on four different public datasets have shown that this technique improves IDTs performance and that the results outperform the ones obtained by most of the state-of-the-art techniques for action recognition.

show abstract

Section: Comparison With Other Action Recognition Techniquesmentioning

confidence: 95%

Section: Further Workmentioning

confidence: 99%

Human action recognition by means of subtensor projections and dense trajectories

Carmona¹,

Climent²

2018

Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…For instance, in computer vision novel feature learning techniques are applied directly on the raw pixel representations of images avoiding the signal parameterization or any other prior preprocessing [15]. Budnik et al [3] report an extensive comparison of the performance of CNN based features with traditional engineered ones, as well as with combinations of them, in the framework of the TRECVid semantic indexing task.…”

Section: Introductionmentioning

confidence: 99%

Automatic Speech Feature Learning for Continuous Prediction of Customer Satisfaction in Contact Center Phone Calls

Segura

Balcells

Umbert

et al. 2016

Advances in Speech and Language Technologies for Iberian Languages

View full text Add to dashboard Cite

Speech related processing tasks have been commonly tackled using engineered features, also known as hand-crafted descriptors. These features have usually been optimized along years by the research community that constantly seeks for the most meaningful, robust, and compact audio representations for the specific domain or task. In the last years, a great interest has arisen to develop architectures that are able to learn by themselves such features, thus bypassing the required engineering effort. In this work we explore the possibility to use Convolutional Neural Networks (CNN) directly on raw audio signals to automatically learn meaningful features. Additionally, we study how well do the learned features generalize for a different task. First, a CNN-based continuous conflict detector is trained on audios extracted from televised political debates in French. Then, while keeping previous learned features, we adapt the last layers of the network for targeting another concept by using completely unrelated data. Concretely, we predict self-reported customer satisfaction from call center conversations in Spanish. Reported results show that our proposed approach, using raw audio, obtains similar results than those of a CNN using classical Mel-scale filter banks. In addition, the learning transfer from the conflict detection task into satisfaction prediction shows a successful generalization of the learned features by the deep architecture.

show abstract

Towards a Custom Designed Mechanism for Indexing and Retrieving Video Transcripts

Turcu

Heras

Palanca

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Learned features versus engineered features for semantic video indexing

Cited by 8 publications

References 30 publications

Human action recognition by means of subtensor projections and dense trajectories

Human action recognition by means of subtensor projections and dense trajectories

Automatic Speech Feature Learning for Continuous Prediction of Customer Satisfaction in Contact Center Phone Calls

Towards a Custom Designed Mechanism for Indexing and Retrieving Video Transcripts

Contact Info

Product

Resources

About