Proceedings of the Web Conference 2020 2020
DOI: 10.1145/3366423.3380004
|View full text |Cite
|
Sign up to set email alerts
|

A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(15 citation statements)
references
References 12 publications
0
15
0
Order By: Relevance
“…They achieve state-of-the-art results for solving particular machine learning problems. It is practically impossible to analyze all of them, but a significant number of them, at one step or another, use the classical concatenation of multimodal vectors [51]- [53], without a deep examination of unique dependencies between them. Nevertheless, there are other models proposing smarter modality aggregation, such as the Contrastive Multimodal Fusion method [54], showing there is growing interest in the ML community for nontrivial multimodal fusion.…”
Section: B Brain-inspired Multimodal Frameworkmentioning
confidence: 99%
“…They achieve state-of-the-art results for solving particular machine learning problems. It is practically impossible to analyze all of them, but a significant number of them, at one step or another, use the classical concatenation of multimodal vectors [51]- [53], without a deep examination of unique dependencies between them. Nevertheless, there are other models proposing smarter modality aggregation, such as the Contrastive Multimodal Fusion method [54], showing there is growing interest in the ML community for nontrivial multimodal fusion.…”
Section: B Brain-inspired Multimodal Frameworkmentioning
confidence: 99%
“…Researchers from multimedia and data mining communities have done a considerable number of works to predict the popularity of online videos. Most of these works intend to predict the popularity of videos from online video or social media websites, such as YouTube [45,52,59], Vine [10,37,48,61], Facebook [4,5], Kuaishou [65], etc. Multi-modal data including popularity evolution pattern and social media, visual, acoustic, textual, geography features are exploited for prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Chen et al [10] propose to predict the popularity of microvideos from Vine using a transductive model, in which social, visual, acoustic and textual features are taken as the input. Following this work, methods like variational encoder-decoder [61] and feature-discrimination transductive model [48] are explored to better capture popularity-related information from the same features. Visual cues are combined with early evolution patterns in [52] to train an support vector regressor (SVR) for popularity prediction.…”
Section: Related Workmentioning
confidence: 99%
“…For example, TLRMVR [8] proposes a novel low-rank multiview embedding learning method to predict the popularity of microvideo. MMVED [9] combines multiple features (image frame, acoustic, and textual info) and considers the randomness for the mapping from data to popularity. Although these approaches are able to achieve efficient prediction, they usually aim at the static complete file and need to parse entire image frame, which is impractical for live streaming.…”
Section: Relate Workmentioning
confidence: 99%
“…These approaches usually fail to address the rationality of MEC application providers or have different objective functions, leading to improper results in cost optimization for live streaming. Some works optimize the resource allocation by predicting the popularity of contents [8][9][10], considering that a few popular videos usually contribute to most of the bandwidth consumption [11]. However, suffering from the similar reason that these models usually focus on the prediction of static contents, it is still hard to meet the real-time requirement of live streaming.…”
Section: Introductionmentioning
confidence: 99%