In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the At-tnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
Mammalian target of rapamycin (mTOR) controls cell growth and proliferation via the raptor-mTOR (TORC1) and rictor-mTOR (TORC2) protein complexes. Recent biochemical studies suggested that TORC2 is the elusive PDK2 for Akt/PKB Ser473 phosphorylation in the hydrophobic motif. Phosphorylation at Ser473, along with Thr308 of its activation loop, is deemed necessary for Akt function, although the regulatory mechanisms and physiological importance of each phosphorylation site remain to be fully understood. Here, we report that SIN1/MIP1 is an essential TORC2/PDK2 subunit. Genetic ablation of sin1 abolished Akt-Ser473 phosphorylation and disrupted rictor-mTOR interaction but maintained Thr308 phosphorylation. Surprisingly, defective Ser473 phosphorylation affected only a subset of Akt targets in vivo, including FoxO1/3a, while other Akt targets, TSC2 and GSK3, and the TORC1 effectors, S6K and 4E-BP1, were unaffected. Our findings reveal that the SIN1-rictor-mTOR function in Akt-Ser473 phosphorylation is required for TORC2 function in cell survival but is dispensable for TORC1 function.
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms previous methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).
The Rehai and Ruidian geothermal fields, located in Tengchong County, Yunnan Province, China, host a variety of geochemically distinct hot springs. In this study, we report a comprehensive, cultivation-independent census of microbial communities in 37 samples collected from these geothermal fields, encompassing sites ranging in temperature from 55.1 to 93.6°C, in pH from 2.5 to 9.4, and in mineralogy from silicates in Rehai to carbonates in Ruidian. Richness was low in all samples, with 21–123 species-level OTUs detected. The bacterial phylum Aquificae or archaeal phylum Crenarchaeota were dominant in Rehai samples, yet the dominant taxa within those phyla depended on temperature, pH, and geochemistry. Rehai springs with low pH (2.5–2.6), high temperature (85.1–89.1°C), and high sulfur contents favored the crenarchaeal order Sulfolobales, whereas those with low pH (2.6–4.8) and cooler temperature (55.1–64.5°C) favored the Aquificae genus Hydrogenobaculum. Rehai springs with neutral-alkaline pH (7.2–9.4) and high temperature (>80°C) with high concentrations of silica and salt ions (Na, K, and Cl) favored the Aquificae genus Hydrogenobacter and crenarchaeal orders Desulfurococcales and Thermoproteales. Desulfurococcales and Thermoproteales became predominant in springs with pH much higher than the optimum and even the maximum pH known for these orders. Ruidian water samples harbored a single Aquificae genus Hydrogenobacter, whereas microbial communities in Ruidian sediment samples were more diverse at the phylum level and distinctly different from those in Rehai and Ruidian water samples, with a higher abundance of uncultivated lineages, close relatives of the ammonia-oxidizing archaeon “Candidatus Nitrosocaldus yellowstonii”, and candidate division O1aA90 and OP1. These differences between Ruidian sediments and Rehai samples were likely caused by temperature, pH, and sediment mineralogy. The results of this study significantly expand the current understanding of the microbiology in Tengchong hot springs and provide a basis for comparison with other geothermal systems around the world.
In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes. Following the two-step (layout-image) generation process, a novel object-driven attentive image generator is proposed to synthesize salient objects by paying attention to the most relevant words in the text description and the pre-generated semantic layout. In addition, a new Fast R-CNN based object-wise discriminator is proposed to provide rich object-wise discrimination signals on whether the synthesized object matches the text description and the pre-generated layout. The proposed Obj-GAN significantly outperforms the previous state of the art in various metrics on the large-scale COCO benchmark, increasing the Inception score by 27% and decreasing the FID score by 11%. A thorough comparison between the traditional grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.