A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a naturallanguage navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visuallygrounded navigation instructions, we present the Matter-port3D Simulator -a large-scale reinforcement learning environment based on real imagery [11]. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -the Room-to-Room (R2R) dataset 1 .
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
Precise, objective data on brood and honey levels in honey bee colonies can be obtained through the analysis of hive frame photographs. However, accurate analysis of all the frame photographs from medium- to large-scale experiments is time-consuming. This limits the number of hives than can be practically included in honeybee studies. Faster estimation methods exist but they significantly decrease precision and their use requires a larger sample size to maintain statistical power. To resolve this issue, we created ‘CombCount’ a python program that automatically detects uncapped cells to speed up measurements of capped brood and capped honey on photos of frames. CombCount does not require programming skills, it was designed to facilitate colony-level research in honeybees and to provide a fast, free, and accurate alternative to older methods based on visual estimations. Six observers measured the same photos of thirty different frames both with CombCount and by manually outlining the entire capped areas with ImageJ. The results obtained were highly similar between both the observers and the two methods, but measurements with CombCount were 3.2 times faster than with ImageJ (4 and 13 min per side of the frame, respectively) and all observers were faster when using CombCount rather than ImageJ. CombCount was used to measure the proportions of capped brood and capped honey on each frame of 16 hives over a year as they developed from packages to full-size colonies over about 60 days. Our data describe the formation of brood and honey stores during the establishment of a new colony.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.