Woncheol Shin scite author profile

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

show abstract

A voting-based ensemble feature network for semiconductor wafer defect classification

Misra

Kim

et al. 2022

Sci Rep

View full text Add to dashboard Cite

Semiconductor wafer defects severely affect product development. In order to reduce the occurrence of defects, it is necessary to identify why they occur, and it can be inferred by analyzing the patterns of defects. Automatic defect classification (ADC) is used to analyze large amounts of samples. ADC can reduce human resource requirements for defect inspection and improve inspection quality. Although several ADC systems have been developed to identify and classify wafer surfaces, the conventional ML-based ADC methods use numerous image recognition features for defect classification and tend to be costly, inefficient, and time-consuming. Here, an ADC technique based on a deep ensemble feature framework (DEFF) is proposed that classifies different kinds of wafer surface damage automatically. DEFF has an ensemble feature network and the final decision network layer. The feature network learns features using multiple pre-trained convolutional neural network (CNN) models representing wafer defects and the ensemble features are computed by concatenating these features. The decision network layer decides the classification labels using the ensemble features. The classification performance is further enhanced by using a voting-based ensemble learning strategy in combination with the deep ensemble features. We show the efficacy of the proposed strategy using the real-world data from SK Hynix.

show abstract

Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Moon¹,

Lee²,

Shin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently a number of studies demonstrated impressive performance on diverse vision-language multi-modal tasks such as image captioning and visual question answering by extending the BERT architecture with multi-modal pre-training objectives. In this work we explore a broad set of multi-modal representation learning tasks in the medical domain, specifically using radiology images and the unstructured report. We propose Medical Vision Language Learner (MedViLL) which adopts a Transformer-based architecture combined with a novel multimodal attention masking scheme to maximize generalization performance for both vision-language understanding tasks (image-report retrieval, disease classification, medical visual question answering) and vision-language generation task (report generation). By rigorously evaluating the proposed model on four downstream tasks with two chest X-ray image datasets (MIMIC-CXR and Open-I), we empirically demonstrate the superior downstream task performance of MedViLL against various baselines including task-specific architectures.

show abstract

Structure-preserving quality improvement of cone beam CT images using contrastive learning

Kang¹,

Shin²,

Yang³

et al. 2023

Computers in Biology and Medicine

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Woncheol Shin

Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Exploration into Translation-Equivariant Image Quantization

A voting-based ensemble feature network for semiconductor wafer defect classification

Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Structure-preserving quality improvement of cone beam CT images using contrastive learning

Contact Info

Product

Resources

About