Home >Technology peripherals >AI >Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

WBOY
WBOYforward
2023-04-12 20:22:011564browse

On December 27, A

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

#, MetaAI’s A

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

As of the evening of the 27th, this tweet The reading volume has reached 73.9k.

He said that given only 5 demonstrations, MoDem can solve problems with sparse rewards and high-dimensional action spaces in 100K interaction steps. Significantly outperforms existing state-of-the-art methods on challenging visual motion control tasks. How excellent is it? They found that MoDem achieved a 150%-250% higher success rate in completing sparse reward tasks than previous methods in low-data regimes

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

#.

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

Lecun also forwarded this research, saying that MoDem’s model architecture is similar to JEPA and can make predictions in the representation space without the need for a decoder.

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

The editor has put the link below, if you are interested, you can take a look~

Paper link: https://arxiv.org/abs/2212.05698

Github link: https: //github.com/facebookresearch/modem

Research Innovation and Model Architecture

The low sample efficiency is the practical application of deploying deep reinforcement learning (RL) algorithms The main challenge, especially visuomotor control.

Model-based RL has the potential to achieve high sample efficiency by simultaneously learning a world model and using synthetic deployment for planning and policy improvements.

However, in practice, the efficient learning of samples in model-based RL is bottlenecked by exploration challenges. This research precisely solves these main challenges.
  • First of all, MoDem solves three main challenges in the field of visual reinforcement learning/control by using world models, imitating RL and self-supervised visual pre-training respectively:
  • Large sample complexity
  • Exploration in high-dimensional state and action space

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

Simultaneous learning of visual representations and behaviors

This The model architecture is similar to Yann LeCun's JEPA and does not require a decoder.

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

The author Aravind Rajeswaran said that compared with Dreamer, which requires a decoder for pixel-level prediction and has a heavy architecture, the decoder-less architecture can support direct insertion of visual representations pre-trained using SSL.

############In addition, based on IL RL, they proposed a three-stage algorithm: ######
  • BC Pre-training Strategy
  • Pre-train the world model using a seed dataset containing demonstrations and explorations. This stage is important for overall stability and efficiency
  • Fine-tuning the world model through online interaction

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

##The results show that the generated algorithm performs well in 21 SOTA results (State-Of-The-Art result) were achieved in hard visual motion control tasks, including Adroit dexterous operation, MetaWorld and DeepMind control suites.

From the data point of view, MoDem performs far better than other models in various tasks, and the results are 150% to 250% higher than the previous SOTA method.

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

The red line shows MoDem’s performance in various tasks

In the process, they also shed light on the importance of different stages in MoDem, the importance of data augmentation for visual MBRL, and the utility of pre-trained visual representations.

Finally, using frozen R3M functionality is far superior to the direct E2E approach. This is exciting and shows that visual pretraining from video can support world models.

But E2E with strong data in August competes with frozen R3M, we can do better through pre-training.

Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun

The above is the detailed content of Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete