Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and topdown attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.
Exosomes are small, single-membrane, secreted organelles of ∼30 to ∼200 nm in diameter that have the same topology as the cell and are enriched in selected proteins, lipids, nucleic acids, and glycoconjugates. Exosomes contain an array of membrane-associated, high-order oligomeric protein complexes, display pronounced molecular heterogeneity, and are created by budding at both plasma and endosome membranes. Exosome biogenesis is a mechanism of protein quality control, and once released, exosomes have activities as diverse as remodeling the extracellular matrix and transmitting signals and molecules to other cells. This pathway of intercellular vesicle traffic plays important roles in many aspects of human health and disease, including development, immunity, tissue homeostasis, cancer, and neurodegenerative diseases. In addition, viruses co-opt exosome biogenesis pathways both for assembling infectious particles and for establishing host permissiveness. On the basis of these and other properties, exosomes are being developed as therapeutic agents in multiple disease models.
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a naturallanguage navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visuallygrounded navigation instructions, we present the Matter-port3D Simulator -a large-scale reinforcement learning environment based on real imagery [11]. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -the Room-to-Room (R2R) dataset 1 .
There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as which caption-generator best understands colors? and can caption-generators count?
Abstract. The firefly luciferase protein contains a peroxisomal targeting signal at its extreme COOH terminus . Site-directed mutagenesis of the luciferase gene reveals that this peroxisomal targeting signal consists of the COOH-terminal three amino acids of the protein, serine-lysine-leucine. When this tripeptide is appended to the COOH terminus of a cytosolic protein (chloramphenicol acetyltransferase), it is sufficient to direct the fusion protein into peroxisomes. Additional mutagenesis experiments reveal that only a limited number of conservative changes can be made in this tripeptide targeting signal without abolishing its activity. These results indicate that peroxisomal protein import, unlike other types of transmembrane translocation, is dependent upon a conserved amino acid sequence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.