Extracting useful features from a scene is an essential step in any computer vision and multimedia data analysis task. Though progress has been made in past decades, it is still quite difficult for computers to comprehensively and accurately recognize an object or pinpoint the more complicated semantics of an image or a video. Thus, feature extraction is expected to remain an active research area in advancing computer vision and multimedia data analysis for the foreseeable future.The approaches in feature extraction can be divided into two categories: model-centric and datadriven. The model-centric approach relies on human heuristics to develop a computer model (or algorithm) to extract features from an image. (We use imagery data as our example throughout this chapter.) Some widely used models are Gabor filter, wavelets, and SIFT [42]. These models were engineered by scientists and then validated via empirical studies. A major shortcoming of the model-centric approach is that unusual circumstances that a model does not take into consideration during its design, such as different lighting conditions and unexpected environmental factors, can render the engineered features less effective. Contrast to the model-centric approach, which dictates representations independent of data, the data-driven approach learns representations from data [10]. Example data-driven algorithms are multilayer perceptron (MLP) and convolutional neural network (CNN), which belong to the general category of neural network and deep learning [27,29].Both model-centric and data-driven approaches employ a model (algorithm or machine). The differences between model-centric and data-driven can be told in two related aspects:• Can data affect model parameters? With model-centric, training data does not affect the model. With data-driven, such as MLP or CNN, their internal parameters are changed/learned based on the discovered structure in large data sets [38].• Can data affect representations? Whereas more data can help a data-driven approach to improve representations, more data cannot change the features extracted by a model-centric approach. For example, the features of an image can be affected by the other images in CNN (because the structure parameters modified through backpropagation are affected by all training images). But the feature set of an image is invariant of the other images in a model-centric pipeline such as SIFT.The greater the quantity and diversity of data, the better the representations can be learned by a data-driven pipeline. In other words, if a learning algorithm has seen enough training instances of an object under various conditions, e.g., in different postures and has been partially occluded, then the features learned from the training data will be more comprehensive. The focus of this chapter is on how neural network, specifically convolutional neural network (CNN), achieves effective representation learning. Neural network, a neuroscience-motivated model, was based on Hubel and Wiesel's research on cats' visual corte...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.