


Vision Pro has another hot new way to play, and this time it is linked with embodied intelligence~
Just like this, the MIT guy used the hand tracking function of Vision Pro to successfully realize Real-time control of robot dogs.
Not only can actions such as opening a door be accurately obtained:
, but there is also almost no delay.
# As soon as the Demo came out, not only netizens praised Goosemeizi, but also various embodied intelligence researchers became excited.
For example, this prospective doctoral student from Tsinghua University:
Some people boldly predict: This is how we will interact with the next generation of machines.
How to implement the project, the author Park Younghyo (Younghyo Park) has open source on GitHub. Relevant apps can be downloaded directly from Vision Pro’s App Store.
Use Vision Pro to train robot dogs
Let’s take a closer look at the App developed by the author--Tracking Steamer.
As the name suggests, this application is designed to use Vision Pro to track human movements and transmit these movement data to other robot devices under the same WiFi in real time.
The motion tracking part mainly relies on Apple’s ARKit library.
Head tracking calls queryDeviceAnchor. Users can reset the head frame to its current position by pressing and holding the Digital Crown.
Wrist and finger tracking are implemented through HandTrackingProvider. It is able to track the position and orientation of the left and right wrists relative to the ground frame, as well as the posture of 25 finger joints on each hand relative to the wrist frame.
In terms of network communication, this App uses gRPC as the network communication protocol to stream data. This enables data to be subscribed to more devices, including Linux, Mac and Windows devices.
In addition, in order to facilitate data transmission, the author has also prepared a Python API that allows developers to programmatically subscribe to and receive tracking data streamed from Vision Pro.
The data returned by the API is in dictionary form, including the SE (3) posture information of the head, wrist, and fingers, that is, the three-dimensional position and direction. Developers can process this data directly in Python for further analysis and control of the robot.
As many professionals have pointed out, regardless of whether the movements of the robot dog are still controlled by humans, in fact, compared to the "control" itself, combined with imitation In the process of learning algorithms, humans are more like robot coaches.
Vision Pro provides an intuitive and simple interaction method by tracking the user's movements, allowing non-professionals to provide accurate training data for robots.
The author himself also wrote in the paper:
In the near future, people may wear devices like Vision Pro just like wearing glasses every day. Imagine what we can do from this How much data is collected in the process!
This is a promising source of data from which robots can learn how humans interact with the real world.
Finally, a reminder, if you want to try this open source project, in addition to a Vision Pro, you also need to prepare:
- Apple Developer Account
- Vision Pro Developer Accessory (Developer Strap, priced at $299)
- Mac computer with Xcode installed
Well, it seems that Apple still has to make a profit first (doge).
Project link: https://github.com/Improbable-AI/VisionProTeleop?tab=readme-ov-file
The above is the detailed content of Train your robot dog in real time with Vision Pro! MIT PhD student's open source project becomes popular. For more information, please follow other related articles on the PHP Chinese website!

在当下的序列建模任务上,Transformer可谓是最强大的神经网络架构,并且经过预训练的Transformer模型可以将prompt作为条件或上下文学习(in-context learning)适应不同的下游任务。大型预训练Transformer模型的泛化能力已经在多个领域得到验证,如文本补全、语言理解、图像生成等等。从去年开始,已经有相关工作证明,通过将离线强化学习(offline RL)视为一个序列预测问题,那么模型就可以从离线数据中学习策略。但目前的方法要么是从不包含学习的数据中学习策略

优化器在大语言模型的训练中占据了大量内存资源。现在有一种新的优化方式,在性能保持不变的情况下将内存消耗降低了一半。该成果由新加坡国立大学打造,在ACL会议上获得了杰出论文奖,并已经投入了实际应用。图片随着大语言模型不断增加的参数量,训练时的内存消耗问题更为严峻。研究团队提出了CAME优化器,在减少内存消耗的同时,拥有与Adam相同的性能。图片CAME优化器在多个常用的大规模语言模型的预训练上取得了相同甚至超越Adam优化器的训练表现,并对大batch预训练场景显示出更强的鲁棒性。进一步地,通过C

论文链接:https://arxiv.org/pdf/2207.09519.pdf代码链接:https://github.com/gaopengcuhk/Tip-Adapter一.研究背景对比性图像语言预训练模型(CLIP)在近期展现出了强大的视觉领域迁移能力,可以在一个全新的下游数据集上进行 zero-shot 图像识别。为了进一步提升 CLIP 的迁移性能,现有方法使用了 few-shot 的设置,例如 CoOp 和 CLIP-Adapter,即提供了少量下游数据集的训练数据,使得 CLIP

本周,芯片创业公司Cerebras宣布了一个里程碑式的新进展:在单个计算设备中训练了超过百亿参数的NLP(自然语言处理)人工智能模型。由Cerebras训练的AI模型体量达到了前所未有的200亿参数,所有这些都无需横跨多个加速器扩展工作负载。这项工作足以满足目前网络上最火的文本到图像AI生成模型——OpenAI的120亿参数大模型DALL-E。Cerebras新工作中最重要的一点是对基础设施和软件复杂性的要求降低了。这家公司提供的芯片WaferScaleEngine-

说到神经网络训练,大家的第一印象都是 GPU + 服务器 + 云平台。传统的训练由于其巨大的内存开销,往往是云端进行训练而边缘平台仅负责推理。然而,这样的设计使得 AI 模型很难适应新的数据:毕竟现实世界是一个动态的,变化的,发展的场景,一次训练怎么能覆盖所有场景呢?为了使得模型能够不断的适应新数据,我们能否在边缘进行训练(on-device training),使设备不断的自我学习?在这项工作中,我们仅用了不到 256KB 内存就实现了设备上的训练,开销不到 PyTorch 的 1/1000,

本文介绍被机器学习顶级国际会议AAAI2023接收的论文《ImprovingTrainingandInferenceofFaceRecognitionModelsviaRandomTemperatureScaling》。论文创新性地从概率视角出发,对分类损失函数中的温度调节参数和分类不确定度的内在关系进行分析,揭示了分类损失函数的温度调节因子是服从Gumbel分布的不确定度变量的尺度系数。从而提出一个新的被叫做RTS的训练框架对特征抽取的可靠性进行建模。基于RTS

多样高质的三维场景生成结果论文地址:https://arxiv.org/abs/2304.12670项目主页:http://weiyuli.xyz/Sin3DGen/引言使用人工智能辅助内容生成(AIGC)在图像生成领域涌现出大量的工作,从早期的变分自编码器(VAE),到生成对抗网络(GAN),再到最近大红大紫的扩散模型(DiffusionModel),模型的生成能力飞速提升。以StableDiffusion,Midjourney等为代表的模型在生成具有高真实感图像方面取得了前所未有的成果。同时

本文经AI新媒体量子位(公众号ID:QbitAI)授权转载,转载请联系出处。AI绘画侵权,实锤了!最新研究表明,扩散模型会牢牢记住训练集中的样本,并在生成时“依葫芦画瓢”。也就是说,像Stable Diffusion生成的AI画作里,每一笔背后都可能隐藏着一次侵权事件。不仅如此,经过研究对比,扩散模型从训练样本中“抄袭”的能力是GAN的2倍,且生成效果越好的扩散模型,记住训练样本的能力越强。这项研究来自Google、DeepMind和UC伯克利组成的团队。论文中还有另一个糟糕的消息,那就是针对这


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version
SublimeText3 Linux latest version

Notepad++7.3.1
Easy-to-use and free code editor

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6
Visual web development tools
