search
HomeTechnology peripheralsAIMeta AI under LeCun bets on self-supervision
Meta AI under LeCun bets on self-supervisionApr 09, 2023 am 09:01 AM
metaNeural network architectureself-supervised learning

Is self-supervised learning really a key step towards AGI?

Meta’s chief AI scientist, Yann LeCun, did not forget the long-term goals when talking about “specific measures to be taken at this moment.” He said in an interview: "We want to build intelligent machines that learn like animals and humans."

In recent years, Meta has published a series of papers on self-supervised learning (SSL) of AI systems. LeCun firmly believes that SSL is a necessary prerequisite for AI systems, which can help AI systems build world models to obtain human-like capabilities such as rationality, common sense, and the ability to transfer skills and knowledge from one environment to another.

Their new paper shows how a self-supervised system called a masked autoencoder (MAE) can learn to reconstruct images, video and even audio from very fragmented, incomplete data. While MAEs are not a new idea, Meta has expanded this work into new areas. LeCun said that by studying how to predict missing data, whether it is a still image or a video or audio sequence, MAE systems are building a model of the world. He said: "If it can predict what is about to happen in the video, it must understand that the world is three-dimensional, that some objects are inanimate and do not move on their own, and other objects are alive and difficult to predict, until prediction Complex behavior of living beings." Once an AI system has an accurate model of the world, it can use this model to plan actions.

LeCun said, “The essence of intelligence is learning to predict.” Although he did not claim that Meta’s MAE system is close to general artificial intelligence, he believes that it is an important step towards general artificial intelligence. ​

But not everyone agrees that Meta researchers are on the right path toward general artificial intelligence. Yoshua Bengio sometimes engages in friendly debates with LeCun about big ideas in AI. In an email to IEEE Spectrum, Bengio laid out some of the differences and similarities in their goals.

Bengio wrote: "I really don't think our current methods (whether self-supervised or not) are enough to bridge the gap between artificial and human intelligence levels." He said that the field needs to make "qualitative progress" , can truly push technology closer to human-scale artificial intelligence.

Bengio agreed with LeCun’s view that “the ability to reason about the world is the core element of intelligence.” However, his team did not focus on models that can predict; A model that can present knowledge in the form of natural language. He noted that such models would allow us to combine these pieces of knowledge to solve new problems, conduct counterfactual simulations, or study possible futures. Bengio's team developed a new neural network framework that is more modular than the one favored by LeCun, who works on end-to-end learning. ​

The Hot Transformer

Of course, Meta is not the first team to successfully use Transformer for visual tasks. Ross Girshick, a researcher at Meta AI, said that Google’s research on Visual Transformer (ViT) inspired the Meta team. “The adoption of the ViT architecture helped (us) eliminate some obstacles encountered during the experiment.”

Girshick is one of the authors of Meta's first MAE system paper. One of the authors of this paper is He Kaiming. They discuss a very simple method: mask random blocks of the input image and reconstruct the lost ones. pixels.

Meta AI under LeCun bets on self-supervisionThe training of this model is similar to BERT and some other Transformer-based language models. Researchers will show them huge text databases, but some words are missing, In other words, it was "covered". The model needs to predict the missing words on its own, and then the masked words are revealed so that the model can check its work and update its parameters. This process keeps repeating. To do something similar visually, the team broke the image into patches, then masked some of the patches and asked the MAE system to predict the missing parts of the image, Girshick explained.

One of the team’s breakthroughs was the realization that masking most of the image would get the best results, a key difference from language transformers, which might only mask 15% of words. “Language is an extremely dense and efficient system of communication, and every symbol carries a lot of meaning,” Girshick said. “But images—these signals from the natural world—are not built to eliminate redundancy. So we This will allow you to compress the content well when creating JPG images."

Meta AI under LeCun bets on self-supervision

Researchers at Meta AI experimented with how much of the image needed to be masked to get the best results.

#Girshick explained that by masking more than 75% of the patches in the image, they eliminated redundancy in the image that would otherwise make the task too trivial for training. Their two-part MAE system first uses an encoder to learn the relationships between pixels from a training dataset, and then a decoder does its best to reconstruct the original image from the masked image. After this training scheme is completed, the encoder can also be fine-tuned for vision tasks such as classification and object detection.

Girshick said, "What's ultimately exciting for us is that we see the results of this model in downstream tasks." When using the encoder to complete tasks such as object recognition, "the gains we see are very substantial. ." He pointed out that continuing to increase the model can lead to better performance, which is a potential direction for future models, because SSL "has the potential to use large amounts of data without manual annotation."

Going all out to learn from massive, unfiltered data sets may be Meta's strategy for improving SSL results, but it's also an increasingly controversial approach. AI ethics researchers like Timnit Gebru have called attention to the biases inherent in the uncurated data sets that large language models learn from, which can sometimes lead to disastrous results.

Self-supervised learning for video and audio

In the video MAE system, the masker obscures 95% of each video frame because the similarity between frames means that the video signal is better than the static Images have more redundancy. Meta researcher Christoph Feichtenhofer said that when it comes to video, a big advantage of the MAE approach is that videos are often computationally intensive, and MAE reduces computational costs by up to 95% by masking out up to 95% of the content of each frame. The video clips used in these experiments were only a few seconds long, but Feichtenhofer said training artificial intelligence systems with longer videos is a very active research topic. Imagine you have a virtual assistant who has a video of your home and can tell you where you left your keys an hour ago.

More directly, we can imagine that image and video systems are both useful for the classification tasks required for content moderation on Facebook and Instagram, Feichtenhofer said, "integrity" is one possible application, "We We are communicating with the product team, but this is very new and we don’t have any specific projects yet.”

For the audio MAE work, Meta AI’s team said they will publish the research results on arXiv soon. They found a clever way to apply the masking technique. They converted the sound files into spectrograms, which are visual representations of the spectrum of frequencies in a signal, and then masked parts of the images for training. The reconstructed audio is impressive, although the model can currently only handle clips of a few seconds. Bernie Huang, a researcher on the audio system, said potential applications of this research include classification tasks, assisting voice over IP (VoIP) transmission by filling in the audio lost when packets are dropped, or finding A more efficient way to compress audio files.

Meta has been conducting open source AI research, such as these MAE models, and also provides a pre-trained large language model to the artificial intelligence community. But critics point out that despite being so open to research, Meta has not made its core business algorithms available for study: those that control news feeds, recommendations and ad placement. ​

The above is the detailed content of Meta AI under LeCun bets on self-supervision. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
微软深化与 Meta 的 AI 及 PyTorch 合作微软深化与 Meta 的 AI 及 PyTorch 合作Apr 09, 2023 pm 05:21 PM

微软宣布进一步扩展和 Meta 的 AI 合作伙伴关系,Meta 已选择 Azure 作为战略性云供应商,以帮助加速 AI 研发。在 2017 年,微软和 Meta(彼时还被称为 Facebook)共同发起了 ONNX(即 Open Neural Network Exchange),一个开放的深度学习开发工具生态系统,旨在让开发者能够在不同的 AI 框架之间移动深度学习模型。2018 年,微软宣布开源了 ONNX Runtime —— ONNX 格式模型的推理引擎。作为此次深化合作的一部分,Me

Meta 推出 AI 语言模型 LLaMA,一个有着 650 亿参数的大型语言模型Meta 推出 AI 语言模型 LLaMA,一个有着 650 亿参数的大型语言模型Apr 14, 2023 pm 06:58 PM

2月25日消息,Meta在当地时间周五宣布,它将推出一种针对研究社区的基于人工智能(AI)的新型大型语言模型,与微软、谷歌等一众受到ChatGPT刺激的公司一同加入人工智能竞赛。Meta的LLaMA是“大型语言模型MetaAI”(LargeLanguageModelMetaAI)的缩写,它可以在非商业许可下提供给政府、社区和学术界的研究人员和实体工作者。该公司将提供底层代码供用户使用,因此用户可以自行调整模型,并将其用于与研究相关的用例。Meta表示,该模型对算力的要

Meta这篇语言互译大模型研究,结果对比都是「套路」Meta这篇语言互译大模型研究,结果对比都是「套路」Apr 11, 2023 pm 11:46 PM

今年 7 月初,Meta AI 发布了一个新的翻译模型,名为 No Language Left behind (NLLB),我们可以将其直译为「一个语言都不能少」。顾名思义,NLLB 可以支持 200 + 语言之间任意互译,Meta AI 还把它开源了。平时你都没见到的语言如卢干达语、乌尔都语等它都能翻译。论文地址:https://research.facebook.com/publications/no-language-left-behind/开源地址:https://github.com/

曝光Meta Quest 3头显固件:揭示室内物体自动识别功能曝光Meta Quest 3头显固件:揭示室内物体自动识别功能Sep 07, 2023 pm 01:17 PM

8月31日消息,近日有关虚拟现实领域的令人振奋消息传出,据可靠渠道透露,meta公司即将在9月27日正式发布其全新虚拟现实头显——metaQuest3。这款头显据称拥有颠覆性的深度测绘技术,将为用户带来更加逼真的混合现实体验。这项名为深度测绘的技术被认为是metaQuest3的一项重大创新。该技术使得虚拟数字物体与真实物体能够在同一空间内进行互动,大大提升了混合现实的沉浸感和真实感。一段在Reddit上流传的视频展示了深度测绘功能的惊人表现,不禁让人惊叹不已。从视频中可以看出,metaQuest

Meta推出4年硬件路线图,致力于打造「圣杯」AR眼镜,烧了137亿美元Meta推出4年硬件路线图,致力于打造「圣杯」AR眼镜,烧了137亿美元Apr 24, 2023 pm 11:04 PM

现在,谁还提元宇宙?2022年,Meta实验室RealityLabs在AR/VR的研发投入已经亏损了137亿美元。比去年(近102亿美元)还要多,简直让人瞠目结舌。也看,生成式AI大爆发,一波ChatGPT狂热潮,让Meta内部重心也有所倾斜。就在前段时间,在公司的季度财报电话会议上,提及「元宇宙」的次数只有7次,而「AI」有23次。做着几乎赔本的买卖,元宇宙就这样凉凉了吗?NoNoNo!Meta近日公布了未来四年VR/AR硬件技术路线图。2025年,发布首款带有显示屏的智能眼镜,以及控制眼镜的

音乐制作元工具AudioCraft发布开源AI工具音乐制作元工具AudioCraft发布开源AI工具Aug 04, 2023 am 11:45 AM

美国东部时间8月2日,Meta发布了一款名为AudioCraft的生成式AI工具,用户可以利用文本提示来创作音乐和音频AudioCraft由三个主要组件构成:MusicGen:使用Meta拥有/特别授权的音乐进行训练,根据文本提示生成音乐。AudioGen:使用公共音效进行训练生成音频或扩展现有音频,后续还可生成环境音效(如狗叫、汽车鸣笛、木地板上的脚步声)。EnCodec(改进版):基于神经网络的音频压缩解码器,可生成更高质量的音乐并减少人工痕迹,或对音频文件进行无损压缩。官方声称,Audio

抢先发布新一代VR头显,Meta想给苹果一个“下马威”?抢先发布新一代VR头显,Meta想给苹果一个“下马威”?Jun 03, 2023 am 09:01 AM

在游戏、元宇宙等领域的推动下,XR(扩展现实,VR/AR/MR统称)赛道的热度明显获得提升,头显设备也成了“香饽饽”,获得了许多企业的青睐,其中就有Meta(META.US)和苹果(AAPL.US)、字节跳动、索尼等巨头。而这些巨头之间的“故事”还引来了大批“吃瓜群众”。打压竞争对手?Meta赶在苹果之前发布最新版头显众所周知,在全球的大型科技企业中,Meta对元宇宙是最上心的,不惜投入巨资早早进行了布局,而VR头显被视为是元宇宙的入口之一,因此该公司在这一领域也下了大功夫,是VR头显领域的龙头

AI 领域再添一员"猛将",Meta 发布全新大型语言模型LLaMAAI 领域再添一员"猛将",Meta 发布全新大型语言模型LLaMAApr 25, 2023 pm 12:52 PM

ChatGTP走红以来,围绕ChatGTP开发出来的AI应用层出不穷;让人们感受到了人工智能的强大!近日,Facebook母公司Meta发布了人工智能大型语言模型(LargeLanguageModelMetaAI)简称LLaMA。扎克伯格在社交媒体上称:”由FAIR团队研发的LLaMA模型是目前世界上水平最高的大型语言模型,目标是帮助研究人员推进他们在人工智能领域的工作!“。与其他大型模型一样,MetaLLaMA的工作原理是将一系列单词作为“输入”并预测下一个单词以递归生成文本。据介

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use