WEKO3
アイテム
CLIP feature-based randomized control using images and text for multiple tasks and robots
http://hdl.handle.net/10061/0002000799
http://hdl.handle.net/10061/0002000799353201fd-dd68-4e4e-8221-2460c979113f
| アイテムタイプ | 学術雑誌論文 / Journal Article(1) | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2025-04-14 | |||||||||||||||
| タイトル | ||||||||||||||||
| タイトル | CLIP feature-based randomized control using images and text for multiple tasks and robots | |||||||||||||||
| 言語 | ||||||||||||||||
| 言語 | eng | |||||||||||||||
| キーワード | ||||||||||||||||
| 主題Scheme | Other | |||||||||||||||
| 主題 | Vision-language model | |||||||||||||||
| キーワード | ||||||||||||||||
| 主題Scheme | Other | |||||||||||||||
| 主題 | CLIP | |||||||||||||||
| キーワード | ||||||||||||||||
| 主題Scheme | Other | |||||||||||||||
| 主題 | randomized controls | |||||||||||||||
| 資源タイプ | ||||||||||||||||
| 資源タイプ | journal article | |||||||||||||||
| アクセス権 | ||||||||||||||||
| アクセス権 | open access | |||||||||||||||
| 著者 |
柴田, 一騎
× 柴田, 一騎
× Deguchi, Hideki
× Taguchi, Shun
|
|||||||||||||||
| 抄録 | ||||||||||||||||
| 内容記述タイプ | Abstract | |||||||||||||||
| 内容記述 | This study presents a control framework leveraging vision language models (VLMs) for multiple tasks and robots. Notably, existing control methods using VLMs have achieved high performance in various tasks and robots in the training environment. However, these methods incur high costs for learning control policies for tasks and robots other than those in the training environment. Considering the application of industrial and household robots, learning in novel environments where robots are introduced is challenging. To address this issue, we propose a control framework that does not require learning control policies. Our framework combines the vision-language CLIP model with a randomized control. CLIP computes the similarity between images and texts by embedding them in the feature space. This study employs CLIP to compute the similarity between camera images and text representing the target state. In our method, the robot is controlled by a randomized controller that simultaneously explores and increases the similarity gradients. Moreover, we fine-tune the CLIP to improve the performance of the proposed method. Consequently, we confirm the effectiveness of our approach through a multitask simulation and a real robot experiment using a two-wheeled robot and robot arm. | |||||||||||||||
| 書誌情報 |
en : Advanced Robotics 巻 38, 号 15, p. 1066-1078, 発行日 2024-08-01 |
|||||||||||||||
| 出版者 | ||||||||||||||||
| 出版者 | Taylor and Francis | |||||||||||||||
| ISSN | ||||||||||||||||
| 収録物識別子タイプ | EISSN | |||||||||||||||
| 収録物識別子 | 1568-5535 | |||||||||||||||
| 出版者版DOI | ||||||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||||||
| 識別子タイプ | DOI | |||||||||||||||
| 関連識別子 | https://doi.org/10.1080/01691864.2024.2379381 | |||||||||||||||
| 出版者版URI | ||||||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||||||
| 識別子タイプ | URI | |||||||||||||||
| 関連識別子 | https://www.tandfonline.com/doi/full/10.1080/01691864.2024.2379381 | |||||||||||||||
| 権利 | ||||||||||||||||
| 権利情報Resource | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |||||||||||||||
| 権利情報 | $00A9 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built uponin any way. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent. | |||||||||||||||
| 著者版フラグ | ||||||||||||||||
| 出版タイプ | NA | |||||||||||||||