Naoaki Kanazawa scite author profile

Naoaki Kanazawa

4Publications

1Citation Statement Received

119Citation Statements Given

How they've been cited

How they cite others

119

Affiliations

The University of Tokyo

Publications

Order By: Most citations

Self-Supervised Learning of Visual Servoing for Low-Rigidity Robots Considering Temporal Body Changes

Kawaharazuka

Kanazawa

Okada

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV is turned on or off. Until now, these states have been recognized by programmatically describing the state of a point cloud or raw image, by annotating and learning images, by using special sensors, etc. In contrast to these methods, we apply Visual Question Answering (VQA) from a Pre-Trained Vision-Language Model (PTVLM) trained on a large-scale dataset, to such binary state recognition. This idea allows us to intuitively describe state recognition in language without any re-training, thereby improving the recognition ability of robots in a simple and general way. We summarize various techniques in questioning methods and image processing, and clarify their properties through experiments.

show abstract

Learning-Based Wiping Behavior of Low-Rigidity Robots Considering Various Surface Materials and Task Definitions

Kawaharazuka

Kanazawa

Okada

et al. 2022

View full text Add to dashboard Cite

In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that best correspond to images, and finding regions in images that correspond to phrases. Although there are some examples, the connection between these pre-trained vision-language models and robotics is still weak. If they are directly connected to robot motions, they lose their versatility due to the embodiment of the robot and the difficulty of data collection, and become inapplicable to a wide range of bodies and situations. Therefore, in this study, we categorize and summarize the methods to utilize the pre-trained vision-language models flexibly and easily in a way that the robot can understand, without directly connecting them to robot motions. We discuss how to use these models for robot motion selection and motion planning without re-training the models. We consider five types of methods to extract information understandable for robots, and show the results of state recognition, object recognition, affordance recognition, relation recognition, and anomaly detection based on the combination of these five methods. We expect that this study will add flexibility and ease-of-use, as well as new applications, to the recognition behavior of existing robots.

show abstract

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

Kawaharazuka¹,

Obinata²,

Kanazawa³

et al. 2023

Preprint

View full text Add to dashboard Cite

State recognition of objects and environment in robots has been conducted in various ways. In most cases, this is executed by processing point clouds, learning images with annotations, and using specialized sensors. In contrast, in this study, we propose a state recognition method that applies Visual Question Answering (VQA) in a Pre-Trained Vision-Language Model (PTVLM) trained from a large-scale dataset. By using VQA, it is possible to intuitively describe robotic state recognition in the spoken language. On the other hand, there are various possible ways to ask about the same event, and the performance of state recognition differs depending on the question. Therefore, in order to improve the performance of state recognition using VQA, we search for an appropriate combination of questions using a genetic algorithm. We show that our system can recognize not only the open/closed of a refrigerator door and the on/off of a display, but also the open/closed of a transparent door and the state of water, which have been difficult to recognize.

show abstract

Food Feature Learning for Knife Cutting Operation of Cooking Robot with Parametirc Bias

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Naoaki Kanazawa

Self-Supervised Learning of Visual Servoing for Low-Rigidity Robots Considering Temporal Body Changes

Learning-Based Wiping Behavior of Low-Rigidity Robots Considering Various Surface Materials and Task Definitions

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

Food Feature Learning for Knife Cutting Operation of Cooking Robot with Parametirc Bias

Contact Info

Product

Resources

About