Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect-AI-php.cn

Home

Technology peripherals

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

王林

Sep 13, 2023 am 10:57 AM

Modeltrain

In recent years, visual pre-training on large-scale real-world data has made significant progress, showing great potential in robot learning based on pixel observations. However, these studies differ in terms of pre-training data, methods, and models. Therefore, which type of data, pre-training methods and models can better assist robot control is still an open question

Based on this, researchers from the ByteDance Research team started fromThree basic perspectives of pre-training data set, model architecture and training method Comprehensively studied the impact of visual pre-training strategies on robot operation tasks, and provided some important experimental results that are beneficial to robot learning. In addition, they proposed a vision pre-training scheme for robot operation called Vi-PRoM, which combines self-supervised learning and supervised learning. The former uses contrastive learning to obtain latent patterns from large-scale unlabeled data, while the latter aims to learn visual semantics and temporal dynamic changes. A large number of robot operation experiments conducted in various simulation environments and real robots have proven the superiority of this solution.

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

##Paper address: https://arxiv.org/pdf/2308.03620.pdf
Project address: https://explore-pretrain-robot.github.io/

Benchmark Research

##Pre-training data

EgoNet is more powerful than ImageNet. Pretrain visual encoders on different datasets (i.e., ImageNet and EgoNet) through contrastive learning methods and observe their performance in robot manipulation tasks. As can be seen from Table 1 below, the model pre-trained on EgoNet achieved better performance on robot operation tasks. Obviously, robots prefer the interactive knowledge and temporal relationships contained in videos in terms of operating tasks. In addition, the egocentric natural images in EgoNet have more global context about the world, which means that richer visual features can be learned

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

##Model structure

ResNet-50 performs better. As can be seen from Table 2 below, ResNet-50 and ResNet-101 perform better than ResNet-34 on robot manipulation tasks. Furthermore, the performance does not improve as the model increases from ResNet-50 to ResNet-101.

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

Pre-training method

Needs to be rewritten according to the meaning of the original text The content is: "Contrastive learning is preferred for pre-training methods. As shown in Table 3 below, MoCo-v3 outperforms MAE on both ImageNet and EgoNet datasets, which proves that contrastive learning is more effective compared to mask image modeling. In addition , the visual semantics obtained through contrastive learning are more important for robot operation than the structural information learned through mask image modeling." Rewritten content: Contrastive learning is the preferred pre-training method. As can be seen from Table 3, MoCo-v3 outperforms MAE on both ImageNet and EgoNet datasets, indicating that contrastive learning is more effective than mask image modeling. In addition, the visual semantics obtained by contrastive learning are more important for robot operation than the structural information learned by mask image modeling

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect Algorithm Introduction

Based on the above exploration, this research proposes a visual pre-training solution for robot operation (Vi-PRoM). This solution extracts a comprehensive visual representation of robot operations by pre-training ResNet-50 on the EgoNet dataset. Specifically, we first use contrastive learning to obtain the interaction patterns between people and objects from the EgoNet data set through self-supervision. Then, two additional learning objectives, namely visual semantic prediction and temporal dynamic prediction, are proposed to further enrich the encoder's representation. The figure below shows the basic process of Vi-PRoM. Notably, this study does not require manual labeling to learn visual semantics and temporal dynamics

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

Experimental results

This research work conducted extensive experiments on two simulation environments (Franka Kitchen and MetaWorld). Experimental results show that the proposed pre-training scheme outperforms previous state-of-the-art methods in robot operation. The results of the ablation experiment are shown in the table below, which can prove the importance of visual semantic learning and temporal dynamic learning for robot operation. Furthermore, when both learning targets are absent, the success rate of Vi-PRoM drops significantly, demonstrating the effectiveness of the collaboration between visual semantic learning and temporal dynamic learning.

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

This work also investigates the scalability of Vi-PRoM. As shown in the figure below on the left, in the Franka Kitchen and MetaWorld simulation environments, the success rate of Vi-PRoM steadily improves as the size of the demo data increases. After training on a larger expert demonstration dataset, the Vi-PRoM model shows its scalability on robot manipulation tasks.

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

# Due to Vi-PRoM’s powerful visual representation capabilities, real The robot can successfully open drawers and cabinet doors

The experimental results on Franka Kitchen show that Vi-PRoM has a higher success rate and is more efficient than R3M in five tasks. High degree of action completion.

R3M:

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

##Vi-PRoM:

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect ##On MetaWorld, due to Vi- PRoM's visual representation learns good semantic and dynamic features, which can be better used for action prediction, so compared to R3M, Vi-PRoM requires fewer steps to complete the operation.

R3M:

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

#Vi-PRoM：

Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect

The above is the detailed content of Rewritten title: Byte launches Vi-PRoM visual pre-training program to improve robot operation success rate and effect. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Build Your Personal AI Assistant with Huggingface SmolLMApr 18, 2025 am 11:52 AM

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

AI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityApr 18, 2025 am 11:49 AM

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

The 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentApr 18, 2025 am 11:44 AM

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Comprehensive Guide to Python Built-in Data Structures - Analytics VidhyaApr 18, 2025 am 11:43 AM

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

First Impressions From OpenAI's New Models Compared To AlternativesApr 18, 2025 am 11:41 AM

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

AI Portfolio | How to Build a Portfolio for an AI Career?Apr 18, 2025 am 11:40 AM

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

What Agentic AI Could Mean For Security OperationsApr 18, 2025 am 11:36 AM

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Google Versus OpenAI: The AI Fight For StudentsApr 18, 2025 am 11:31 AM

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Will R.E.P.O. Have Crossplay?

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

WebStorm Mac version

Useful JavaScript development tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

The most popular open source editor

Hot Topics

Where is the login entrance for gmail email?

7554

CakePHP Tutorial

1382

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers