TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation

Dinh, Tan M.; Nguyen, Rang; Hua, Binh-Son

doi:10.1007/978-3-031-20059-5_34

Cited by 8 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To evaluate individual generated images based on a prompt, previous studies [49,64] often employ metrics based on Contrastive Language-Image Pre-Training (CLIP) [41], which compute text-image consistency by cosine similarity between text and image embeddings in the joint representation space. To better align with human preferences, researchers explored fine-tuned CLIP using datasets of human ratings on images created from identical prompts [16,61,62]. They further utilized scores predicted by the fine-tuned CLIP to approximate human assessment.…”

Section: Evaluation Of Text-to-image Generationmentioning

confidence: 99%

IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models

Zeng,

Gao,

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and lightening computation overload but neglect alignment with user intentions, particularly in manual curation of multi-modal training data and intent-oriented evaluation. Informed by a formative study with fine-tuning practitioners for comprehending user intentions, we propose IntentTuner, an interactive framework that intelligently incorporates human intentions throughout each phase of the fine-tuning workflow. IntentTuner enables users to articulate training intentions with imagery exemplars and textual descriptions, automatically converting them into effective data augmentation strategies. Furthermore, IntentTuner introduces novel metrics to measure user intent alignment, allowing intent-aware monitoring and evaluation of model training. Application exemplars and user studies demonstrate that IntentTuner streamlines fine-tuning, reducing cognitive effort and yielding superior models compared to the common baseline tool.CCS Concepts: • Human-centered computing → User interface toolkits.

show abstract

Section: Evaluation Of Text-to-image Generationmentioning

confidence: 99%

IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models

Zeng,

Gao,

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…Evaluation Metrics. Following ) and (Dinh, Nguyen, and Hua 2022), we quantitatively assess the image quality and aesthetic, using the non-reference metrics. Specifically, we choose NIMA (Talebi and Milanfar 2018), MUSIQ (Ke et al 2021), DB-CNN (Zhang et al 2020), and TReS (Golestaneh, Dadsetan, and Kitani 2022).…”

Section: Quantitative Comparisonmentioning

confidence: 99%

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Hei,

Guo,

Wang

et al. 2024

AAAI

View full text Add to dashboard Cite

Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics. Data and code are available at https://github.com/Naylenv/UF-FGTG.

show abstract

Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Miyake,

Matsukawa,

Suzuki

2024

SN COMPUT. SCI.

View full text Add to dashboard Cite

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation

Cited by 8 publications

References 24 publications

IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models

IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Contact Info

Product

Resources

About