Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO

Zhao, Lei; Hu, Qinghua; Wang, Wenwu

doi:10.1109/tmm.2015.2477058

Cited by 112 publications

(46 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The HGR maximal correlation is originally defined on two random variables. In contrast to reconstruction models (Srivastava and Salakhutdinov 2012; Zhao, Hu, and Wang 2015), the multi-modal extension for correlation based models is not straightforward. New modalities will bring additional whitening constraints, and the computational complexities scales up.…”

Section: Extension To More or Missing Modalitiesmentioning

confidence: 99%

An Efficient Approach to Informative Feature Extraction from Multimodal Data

Wang

Huang³

et al. 2019

AAAI

View full text Add to dashboard Cite

One primary focus in multimodal feature extraction is to find the representations of individual modalities that are maximally correlated. As a well-known measure of dependence, the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation becomes an appealing objective because of its operational meaning and desirable properties. However, the strict whitening constraints formalized in the HGR maximal correlation limit its application. To address this problem, this paper proposes Soft-HGR, a novel framework to extract informative features from multiple data modalities. Specifically, our framework prevents the "hard" whitening constraints, while simultaneously preserving the same feature geometry as in the HGR maximal correlation. The objective of Soft-HGR is straightforward, only involving two inner products, which guarantees the efficiency and stability in optimization. We further generalize the framework to handle more than two modalities and missing modalities. When labels are partially available, we enhance the discriminative power of the feature representations by making a semi-supervised adaptation. Empirical evaluation implies that our approach learns more informative feature mappings and is more efficient to optimize.

show abstract

Section: Extension To More or Missing Modalitiesmentioning

confidence: 99%

An Efficient Approach to Informative Feature Extraction from Multimodal Data

Wang

Huang³

et al. 2019

AAAI

View full text Add to dashboard Cite

show abstract

“…In recent times, Deep Learning techniques are widely being used to solve many problems of computer vision (e.g., [11][12] [13] [14] [15]). Although Deep learning is preferred to Sparse Representations (SR) to improve retrieval accuracy in content-based image retrieval (CBIR) problems [16][17] [18] [19], these approaches can also be combined for the same purpose [20][21] [22] [23]. This was the main motivation of the current research.…”

Section: Introductionmentioning

confidence: 99%

Deep Face Image Retrieval: a Comparative Study with Dictionary Learning

Tarawneh

Hassanat

Celik

et al. 2019

2019 10th International Conference on Information and Communication Systems (ICICS)

View full text Add to dashboard Cite

Facial image retrieval is a challenging task since faces have many similar features (areas), which makes it difficult for the retrieval systems to distinguish faces of different people. With the advent of deep learning, deep networks are often applied to extract powerful features that are used in many areas of computer vision. This paper investigates the application of different deep learning models for face image retrieval, namely, Alexlayer6, Alexlayer7, VGG16layer6, VGG16layer7, VGG19layer6, and VGG19layer7, with two types of dictionary learning techniques, namely K-means and K-SVD. We also investigate some coefficient learning techniques such as the Homotopy, Lasso, Elastic Net and SSF and their effect on the face retrieval system.The comparative results of the experiments conducted on three standard face image datasets show that the best performers for face image retrieval are Alexlayer7 with K-means and SSF, Alexlayer6 with K-SVD and SSF, and Alexlayer6 with K-means and SSF. The APR and ARR of these methods were further compared to some of the state of the art methods based on local descriptors. The experimental resultsshow that deep learning outperforms most of those methods and therefore can be recommended for use in practice of face image retrieval

show abstract

“…For CBIR, the CNN can be used as a feature extractor, and the resultant features applied to present the image contents. Although Deep learning is preferred over SR to improve retrieval accuracy in CBIR problems [33][34] [35] [36], these algorithms have also been employed together with the same aim [37] [38][39] [40]. Therefore, this study presents an extensive number of experiments to figure out the best combination between these two leading approaches to miximise the performance of CBIR systems.…”

mentioning

confidence: 99%

Detailed investigation of deep features with sparse representation and dimensionality reduction in CBIR: A comparative study

Tarawneh

Celik

Hassanat

et al. 2020

IDA

View full text Add to dashboard Cite

Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95% and 93%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets IntroductionGiven a set of images S and an input image i, the goal of a content-based image retrieval (CBIR) system is to search S for i and return the most related/similar images to i, based on their contents. This emergent field responds to an urgent need to search for an image based on its content, rather than typing text to describe image content to be searched for. That is, CBIR systems allow users to conduct a query by image (QBI), and the system's task is to identify the images that are relevant to that image. Prior to CBIR, the traditional means of searching for images was typing a text describing the image content, known as query by text (QBT). However, QBT requires predefined image information, such as metadata, which necessitate human intervention to annotate images in order to describe their contents. This is unfeasible, particularly with the emergence of big data; for example, Flickr creates approximately 3.6 TB of image data, while Google deals with approximately 20,000 TB of data daily[1], which mostly comprise images and videos. Applications of CBIR are massive in terms of numbers and areas, which include, but are not limited to, medical image analysis [2], image mining[3][4][5], surveillance[6], biometrics[7], security[8][9][10], and remote sensing[11].The key to the success of a CBIR system lies in extracting features from an image to define its content. These features are stored to describe each image, which is implemented automatically by the system, using specific algorithms developed for the extraction process. Similarly, a query process is conducted by extracting the same features from the query image to determine the most similar images from a feature dataset, ...

show abstract

Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO

Cited by 112 publications

References 44 publications

An Efficient Approach to Informative Feature Extraction from Multimodal Data

An Efficient Approach to Informative Feature Extraction from Multimodal Data

Deep Face Image Retrieval: a Comparative Study with Dictionary Learning

Detailed investigation of deep features with sparse representation and dimensionality reduction in CBIR: A comparative study

Contact Info

Product

Resources

About