A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Ullah, Ubaid; Lee, Jeong-Sik; An, Chang-Hyeon; Lee, Hyeonjin; Park, Su-Yeong; Baek, Rock‐Hyun; Choi, Hyun‐Chul

doi:10.3390/s22186816

Cited by 6 publications

(4 citation statements)

References 451 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Text Evaluation Methodsmentioning

confidence: 99%

“…Evaluation methods and metrics are needed to determine the validity of auto-generated captions [63,67]. Popular evaluation metrics are shown in Table 3, but more extensive reviews currently exist in the literature [63,87]. The MS COCO Dataset Challenge uses BLEU, ROUGE, METEOR, CIDEr, and SPICE to evaluate performance, so these have become the status quo for evaluating the similarity between texts [74].…”

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

“…As GAI focuses on using AI to generate a new creation, visual GAI focuses on the translation between text and visualization [63]. The flow of translation can occur in either direction, either by taking text and transforming it into an image or by taking an image and deriving a description or caption [63][64][65][66][67]. Previous similar studies include [67,68]; however, we differentiate ourselves by utilizing different image-to-text and text-to-image generators, text prompts, and evaluation metrics.…”

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

See 3 more Smart Citations

Uncertainty in Visual Generative AI

Combs,

Moyer,

Bihl

2024

Algorithms

View full text Add to dashboard Cite

Recently, generative artificial intelligence (GAI) has impressed the world with its ability to create text, images, and videos. However, there are still areas in which GAI produces undesirable or unintended results due to being “uncertain”. Before wider use of AI-generated content, it is important to identify concepts where GAI is uncertain to ensure the usage thereof is ethical and to direct efforts for improvement. This study proposes a general pipeline to automatically quantify uncertainty within GAI. To measure uncertainty, the textual prompt to a text-to-image model is compared to captions supplied by four image-to-text models (GIT, BLIP, BLIP-2, and InstructBLIP). Its evaluation is based on machine translation metrics (BLEU, ROUGE, METEOR, and SPICE) and word embedding’s cosine similarity (Word2Vec, GloVe, FastText, DistilRoBERTa, MiniLM-6, and MiniLM-12). The generative AI models performed consistently across the metrics; however, the vector space models yielded the highest average similarity, close to 80%, which suggests more ideal and “certain” results. Suggested future work includes identifying metrics that best align with a human baseline to ensure quality and consideration for more GAI models. The work within can be used to automatically identify concepts in which GAI is “uncertain” to drive research aimed at increasing confidence in these areas.

show abstract

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

See 2 more Smart Citations

Uncertainty in Visual Generative AI

Combs,

Moyer,

Bihl

2024

Algorithms

View full text Add to dashboard Cite

show abstract

A comprehensive construction of deep neural network‐based encoder–decoder framework for automatic image captioning systems

Rahman,

Uzzaman,

Sami

et al. 2024

IET Image Processing

View full text Add to dashboard Cite

This study introduces a novel encoder–decoder framework based on deep neural networks and provides a thorough investigation into the field of automatic picture captioning systems. The suggested model uses a “long short‐term memory” decoder for word prediction and sentence construction, and a “convolutional neural network” as an encoder that is skilled at object recognition and spatial information retention. The long short‐term memory network functions as a sequence processor, generating a fixed‐length output vector for final predictions, while the VGG‐19 model is utilized as an image feature extractor. For both training and testing, the study uses a variety of photos from open‐access datasets, such as Flickr8k, Flickr30k, and MS COCO. The Python platform is used for implementation, with Keras and TensorFlow as backends. The experimental findings, which were assessed using the “bilingual evaluation understudy” metric, demonstrate the effectiveness of the suggested methodology in automatically captioning images. By addressing spatial relationships in images and producing logical, contextually relevant captions, the paper advances image captioning technology. Insightful ideas for future study directions are generated by the discussion of the difficulties faced during the experimentation phase. By establishing a strong neural network architecture for automatic picture captioning, this study creates opportunities for future advancement and improvement in the area.

show abstract

Exploring the Role of Mathematical Modelling in Automatic Scene Generation amidst Rapid Technological Advances

Kaur,

Khurana

2023

2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)

View full text Add to dashboard Cite

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Cited by 6 publications

References 451 publications

Uncertainty in Visual Generative AI

Uncertainty in Visual Generative AI

A comprehensive construction of deep neural network‐based encoder–decoder framework for automatic image captioning systems

Exploring the Role of Mathematical Modelling in Automatic Scene Generation amidst Rapid Technological Advances

Contact Info

Product

Resources

About