2022
DOI: 10.1049/ipr2.12496
|View full text |Cite
|
Sign up to set email alerts
|

A deep grouping fusion neural network for multimedia content understanding

Abstract: How Deep Neural Networks (DNNs) best cope with the understanding of multimedia contents still remains an open problem, mainly due to two factors. First, conventional DNNs cannot effectively learn the representations of the images with sparse visual information. For example, the images describing knowledge concepts in textbooks. Second, existing DNNs cannot effectively capture the fine‐grained interactions between the images and text descriptions. To address these issues, we propose a deep Cross‐Media Grouping … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 47 publications
0
4
0
Order By: Relevance
“…(b) Food category prediction model. We embed Edamam training data containing menu items concatenated with their ingredients using a pre-trained MPNet model [49]. We then cluster the training data using HDBSCAN, which we treat as a ground truth food category.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…(b) Food category prediction model. We embed Edamam training data containing menu items concatenated with their ingredients using a pre-trained MPNet model [49]. We then cluster the training data using HDBSCAN, which we treat as a ground truth food category.…”
Section: Methodsmentioning
confidence: 99%
“…We first utilize a range of data to create specialized language models for extracting features [48]. Next, we use a state-of-the-art sentence embedding model, MPNet [49] to embed menu item names with their ingredients to create ingredient-contextualized food clusters. These clusters act as food category pseudo-labels to train a model that maps menu item names (without their ingredients) to the learned pseudo-labels.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, as mentioned earlier, we generated 3 more training sets featuring the same tweets but with new labels obtained through 3 distinct prompting techniques utilizing GPT-4. Subsequently, we fine-tuned eight transformer-based LLMs, namely Bert [31], Albert [32], Deberta [33], BerTweet [34], MPNet [35], and three Robertabased models pre-trained on i) a general Twitter dataset (TRob) [36], ii) a Twitter sentiment dataset (TRobSen) [23], and iii) a Twitter stance dataset (TRobStan) [23]. These models were separately fine-tuned using our four training datasets.…”
Section: Stance Classificationmentioning
confidence: 99%
“…Neural retrieval models are then developed for semantic retrieval [10]. Among them, dense retrieval methods [11] based on largescale pre-trained language models [12], [13], e.g., DPR [14] and SBERT [15], map both query and document into a continuous vector space (dense vector space) where semantically similar words, phrases and sentences are closer to each other, and thus also called semantic retrieval methods. Similarly, two types of code modeling methods exist: Information Retrieval (IR)-based and Machine Learning (ML)-based methods.…”
Section: B Text and Code Representation For Computational Notebooksmentioning
confidence: 99%