Deep Robust Unsupervised Multi-Modal Network

Yang, Yang; Wu, Yifeng; Zhan, De-Chuan; Liu, Zhibin; Jiang, Yuan

doi:10.1609/aaai.v33i01.33015652

Cited by 6 publications

(4 citation statements)

References 19 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) Dense features: Since ETA is a time prediction problem, the time-related features are particularly important, so we calculated the sum and maximum value of link-time and crosstime [9]. In addition, the state of road conditions is a key piece of information.…”

Section: Feature Engineeringmentioning

confidence: 99%

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Huang

Zhang²,

Bao

et al. 2021

Proceedings of the 29th International Conference on Advances in Geographic Information Systems

View full text Add to dashboard Cite

Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustness of the solution on the A/B list and finally won first place in the SIGSPATIAL 2021 GISCUP competition.

show abstract

Section: Feature Engineeringmentioning

confidence: 99%

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Huang

Zhang²,

Bao

et al. 2021

Proceedings of the 29th International Conference on Advances in Geographic Information Systems

View full text Add to dashboard Cite

show abstract

“…This involves comprehending the given image through computer vision techniques and generating corresponding descriptions using natural language processing. Initially, researchers explored the encoder-decoder architecture (Yang et al 2019a;Zhang et al 2019;Yang et al 2019c) with CNNs (Albawi, Mohammed, and Al-Zawi 2017) as image encoders and LSTM (Greff et al 2017) as text decoders (Vinyals et al 2015). To consider local and global features simultaneously, (Huang et al 2019) used image regions to decode image segmentations sequentially to words, adding attention mechanisms to focus on specific image regions during decoding.…”

Section: Introductionmentioning

confidence: 99%

Noise-Aware Image Captioning with Progressively Exploring Mismatched Words

Fu,

Song,

Zhou

et al. 2024

AAAI

View full text Add to dashboard Cite

Image captioning aims to automatically generate captions for images by learning a cross-modal generator from vision to language. The large amount of image-text pairs required for training is usually sourced from the internet due to the manual cost, which brings the noise with mismatched relevance that affects the learning process. Unlike traditional noisy label learning, the key challenge in processing noisy image-text pairs is to finely identify the mismatched words to make the most use of trustworthy information in the text, rather than coarsely weighing the entire examples. To tackle this challenge, we propose a Noise-aware Image Captioning method (NIC) to adaptively mitigate the erroneous guidance from noise by progressively exploring mismatched words. Specifically, NIC first identifies mismatched words by quantifying word-label reliability from two aspects: 1) inter-modal representativeness, which measures the significance of the current word by assessing cross-modal correlation via prediction certainty; 2) intra-modal informativeness, which amplifies the effect of current prediction by combining the quality of subsequent word generation. During optimization, NIC constructs the pseudo-word-labels considering the reliability of the origin word-labels and model convergence to periodically coordinate mismatched words. As a result, NIC can effectively exploit both clean and noisy image-text pairs to learn a more robust mapping function. Extensive experiments conducted on the MS-COCO and Conceptual Caption datasets validate the effectiveness of our method in various noisy scenarios.

show abstract

“…Later on, Feng and Zhou [24] show that random forests can do auto-encoder, implying that the informative rules of decision trees may accomplish representation learning. Deep forest is extended to numerous tasks and is successfully applied in metric learning [25] , multi-label learning [26] , semi-supervised learning [27] , financial fraud detection [28,29] , etc. Deep forests, on the other hand, require a significant amount of memory and time due to the storing of multi-layer forest modules to do layer-bylayer prediction on the test set.…”

Section: Introductionmentioning

confidence: 99%

A Region‐Based Analysis for the Feature Concatenation in Deep Forests

Lyu

CHEN

Zhou

2022

Chinese J of Electronics

View full text Add to dashboard Cite

Deep forest is a tree‐based deep model made up of non‐differentiable modules that are trained without backpropagation. Despite the fact that deep forests have achieved considerable success in a variety of tasks, feature concatenation, as the ingredient for forest representation learning, still lacks theoretical understanding. In this paper, we aim to understand the influence of feature concatenation on predictive performance. To enable such theoretical studies, we present the first mathematical formula of feature concatenation based on the two‐stage structure, which regards the splits along new features and raw features as a region selector and a region classifier respectively. Furthermore, we prove a region‐based generalization bound for feature concatenation, which reveals the trade‐off between Rademacher complexities of the two‐stage structure and the fraction of instances that are correctly classified in the selected region. As a consequence, we show that compared with the prediction‐based feature concatenation (PFC), the advantage of interaction‐based feature concatenation (IFC) is that it obtains more abundant regions through distributed representation and alleviates the overfitting risk in local regions. Experiments confirm the correctness of our theoretical results.

show abstract

Deep Robust Unsupervised Multi-Modal Network

Cited by 6 publications

References 19 publications

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Noise-Aware Image Captioning with Progressively Exploring Mismatched Words

A Region‐Based Analysis for the Feature Concatenation in Deep Forests

Contact Info

Product

Resources

About