We propose a novel approach that improves text-guided image manipulation performance in this paper. Text-guided image manipulation aims at modifying some parts of an input image in accordance with the user's text description by semantically associating the regions of the image with the text description. We tackle the conventional methods' problem of modifying undesired parts caused by differences in representation ability between text descriptions and images. Humans tend to pay attention primarily to objects corresponding to the foreground of images, and text descriptions by humans mostly represent the foreground. Therefore, it is necessary to introduce not only a foreground-aware bias based on text descriptions but also a background-aware bias that the text descriptions do not represent. We introduce an image segmentation network into the generative adversarial network for image manipulation to solve the above problem. Comparative experiments with three state-of-the-art methods show the effectiveness of our method quantitatively and qualitatively.
The details of the matches of soccer can be estimated from visual and audio sequences, and they correspond to the occurrence of important scenes. Therefore, the use of these sequences is suitable for important scene detection. In this paper, a new multimodal method for important scene detection from visual and audio sequences in far-view soccer videos based on a single deep neural architecture is presented. A unique point of our method is that multiple classifiers can be realized by a single deep neural architecture that includes a Convolutional Neural Network-based feature extractor and a Support Vector Machine-based classifier. This approach provides a solution to the problem of not being able to simultaneously optimize different multiple deep neural architectures from a small amount of training data. Then we monitor confidence measures output from this architecture for the multimodal data and enable their integration to obtain the final classification result.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.