Humanity is ushering in an explosive update in the field of artificial intelligence. Almost every step in the expansion of technology into the unknown has attracted astonishing attention.
In the process of expanding the boundaries of artificial intelligence, innovation and disagreement coexist in the technical routes of important tracks. The judgment and choices of technology pioneers influence the footsteps of many followers.
In the past year, this website has exclusively taken the lead in introducing excellent companies such as Dark Side of the Moon, Shengshu Technology, Aishi Technology, and Wuwen Core Dome to everyone, leaving them with the first "10,000-word interview script" in the Internet world. ". At a stage when technology routes have not yet converged, we see the leadership of AI entrepreneurs who truly have faith, courage, and systematic cognition.
Therefore, we launched the "AI Pioneers" column, hoping to continue to find and record entrepreneurs with leadership qualities in various subdivisions of artificial intelligence in the AGI era, introduce the most outstanding and high-potential startups in the AI track, and share their achievements in The most cutting-edge and distinctive knowledge in the field of AI.
Author: Jiang Jingling
Even though young academic geniuses have become one of the mainstream backgrounds of current AGI founders, Yang Fengyu, who was born in 2000, is still surprisingly young. . Yang Fengyu, an undergraduate in computer science at the University of Michigan and a doctoral student in computer science at Yale University, is only 23 years old and started his own embodied intelligent robot business last year.
In 2024, the UniX AI embodied intelligence company founded by him completed the development and manufacturing of a wheeled humanoid robot within five months. This robot has functions such as "after-meal cleaning" and "laundry" Mass production and external sales will begin in September. While many embodied intelligent robots are still in the laboratory stage, this is a very fast commercialization speed. In Suzhou, UniX AI’s robot mass production factory has exceeded 2,500 square meters.
This company, which almost no one had heard of last year, has recruited many senior technical talents in the robotics industry within half a year. "The R&D director of the head service robot is helping us make the chassis, and there are also some top talents from the humanoid robot company who are responsible for our hardware." In July 2024, Professor Wang Hesheng, a well-known robotics expert from Shanghai Jiao Tong University, announced that he would officially join UniX AI as chief scientist. In the first technology demonstration video released by UniX AI, a wheeled humanoid robot named Wanda can complete tasks such as grabbing tofu, assisting in sorting clothes, and taking clothes to a washing machine for cleaning. UniX AI seems to have found a solution to the "flexible task" problem that is currently difficult for embodied intelligence companies to solve. "I don't think there is anything wrong with being young. From a technical perspective, many new technologies and products are created by young people with strong academic backgrounds." To our surprise, As a post-2000 generation, Yang Fengyu himself shows maturity beyond his age in his conversation, and has a very clear understanding of company management and the technical stage of embodied intelligence. Our curiosity about UniX AI focuses on how an embodied intelligence company with almost no news in the venture capital circle can achieve such a fast development speed; as one of the few companies founded by post-00s generation Embodied intelligence company, how does UniX AI achieve development from 0 to 1? What does UniX AI’s final roadmap for embodied intelligence look like? With these questions in mind, this website started the first public media dialogue with Yang Fengyu since he started his business.
Yale post-00s Joining the Embodied Intelligent Entrepreneurship
This site: Have you graduated now? Yang Fengyu: I went to Yale directly as an undergraduate, and I basically met all the thesis requirements for my doctoral graduation. Take this year for example, I won 4 CVPR papers, plus others, there are more than ten papers in total. Top conference articles on artificial intelligence and robotics. This site: Your energy is very strong. Yang Fengyu: (laughing), I often stay up until 3:30 in the morning, and I even went to get a diabetic injection some time ago. Mainly because the team is together and we often don’t look at our watches. When we look up, it’s already very late. This site: When did you first think of starting a business? Yang Fengyu: I have always believed that entrepreneurship is about "the right time, right place, right people". Last year we saw great progress in technology at the perceptual level. Some large models or base models including multi-modal models such as vision, language models and touch have made great progress. This allows us to see to the possibility of achieving your goals. In addition, the country has also launched a series of support policies to provide a good environment for entrepreneurship. This is "the right time". "Poor location": There is no doubt that universal humanoid robots are the next development direction after new energy vehicles. China has unparalleled advantages in the supply chain, and the high technology in the Yangtze River Delta There are also a lot of talents. At the beginning, we did some research to find out at what stage the engineering level of the robot industry has progressed, where is the market demand, what problems did the previous generation of robots solve, and where are its future opportunities? The key to success is to find the right person. This year, we formally formed a team and quickly assembled experts from many fields, including the R&D director of the head-mounted sweeping robot, as well as some top talents from the head-humanoid robot company, who are responsible for our hardware. At the algorithm level, I have recruited a group of talents in the United States and Europe, including some of my classmates and seniors. This is "humanity". As a founder and CEO, the most important thing is to gather resources. UniX AI is a global company that combines the advantages of robot software, hardware, and supply chain from different countries around the world; At the same time, we have an international plan, through continuous efforts in one-year, three-year, and five-year plans , realizing the company’s vision of Robots For All.This site: Briefly introduce your academic experienceYang Fengyu: I went from elementary school to high school in China, and went to the University of Michigan to major in computer science for my undergraduate degree. I first came into contact with vision and machine learning. Later, under the influence of my mentor's "multimodal learning", I began to conduct visual and tactile research. Published 5 papers on robot visual and tactile sensation during my undergraduate period, Among them, "Touch and Go: Learning from Human-Collected Vision and Touch" is the largest visual and tactile sensing data set in the world, has been used by artificial intelligence and Accepted by NuerIPS, the top conference in the field of machine learning. In another work, we introduced the diffusion model for the first time to complete the mutual transformation between vision and touch, and the results were accepted by ICCV. For robots, touch is very important. It is difficult to tell whether a piece of clothing is polyester, cotton or silk with the naked eye. Only by actually touching it can you tell the different textures. In addition, some delicate activities, such as inserting the charging cable into the charging port, also require continuous adjustment through touch, which cannot be completed by vision alone. This site: Then you came to Yale. Yang Fengyu: Because of some work on the visual and tactile aspects of robots, especially the transformation of visual and tactile sensations and their generalization in large language models, I won the title of Outstanding Undergraduate Scientist from the North American Computer Society, the first in the history of the school. First person. Finally, he chose Yale University for doctoral studies. During this period, I published some papers one after another, including "Binding touch to everything: Learning unified multimodal tactile representations" (CVPR, 2024, pp.26340-26353). In this paper, I proposed UniTouch, the world's first large tactile model suitable for multiple different tactile sensors, is suitable for vision-based tactile sensors connected to multiple modalities such as vision, speech and sound. Another paper"Tactile-Augmented Radiance Fields" (CVPR, 2024, pp.26529-26539) established the world's first 3D visual and tactile model TARF that can be generalized at the scene level. The generalization ability of UniX AI humanoid robot is also based on this model.
This site: Do you think being born after 2000 is more advantageous or disadvantageous to you? Yang Fengyu: In a startup company, the founder is the soul. Many people think I am very young, but I think being born in the 2000s is not a problem.
From a technical perspective, young people have a very strong driving role in welcoming this wave of technological change and track innovation. Many new technologies and products are created by today’s young people, especially in high-tech industries, where the entry threshold is relatively high. One of the members of Sora's core team is also my classmate. He showed strong technical ability when he was at the University of Michigan. From a cognitive and experience level, I think learning quickly and correcting mistakes quickly is also a path. The other is personality. You must be willing to persevere and be resilient, leave no stone unturned, and have the spirit of “opening roads when encountering mountains and building bridges when meeting waters”. After all, entrepreneurship is all about results in the end. Of course, there are also many experienced experts in the UniX AI team. They have rich experience in structures, electronics, etc. Only by effective cooperation between us can we launch our products in a short time.
Visual and tactile + operation Improve the generalization ability of robots
This site: Why is the improvement of tactile sense important to robots? Yang Fengyu: Human beings are multi-sensory animals. Your action decisions are usually the combined influence of information transmitted from multiple senses. The same is true for intelligent robots in theory. Tap is one of the most important sensory information. Compared with visual feedback, it is generated after the robot interacts with the environment, while visual feedback comes before. When the robot grabs an object, the object deforms. Essentially, after this interaction occurs, the incremental information the robot obtains comes from touch—how it feels. Having tactile information allows the robot to perform better on some more complex and delicate tasks, greatly improving the success rate of grasping tasks. Especially in the grasping of flexible objects, the role of touch is more obvious , it can be said that it is a qualitative improvement from basically impossible to complete the task to being able to complete the task. For example, our wheeled humanoid robot Wanda has accomplished tasks such as pinching eggs, grabbing tofu, and washing clothes. It relies purely on vision. Without feedback, it is difficult for the robot to perform it.
Why robots now mainly rely on vision to make judgments is because compared with other data, visual data is the most direct, easy to obtain and train, and there is a large amount of data available. But when the robot moves further toward embodiment, relying solely on vision is definitely not enough. As a kind of sensory information that relies on interaction, the significance of being able to reasonably use tactile information is that the robot can gradually learn from the real interaction with the world and become more usable and generalized. This site: Why does the robot's control level of flexible objects improve after adding tactile sense? What is the principle? Yang Fengyu: The main principle is that there is a big difference in grabbing and operating flexible objects and rigid objects. The physical shape of a rigid object basically does not change before and after being touched, so it is relatively easy to judge when grasping through visual observation. However, it is difficult to determine what will happen after contact with a flexible object by observing it before grasping or operating it, because a large number of occlusions and deformations will occur during the grasping process, and these deformations are difficult to accurately predict through vision. of. For example, when holding a tissue, once the tissue is held in the hand, it will completely block the line of sight. At this time, vision can hardly provide effective information to judge how to grab or operate. In this case, we can only rely on physical information such as touch to complete the perception. This site: Why does it seem that most of the time I don’t need to try to grab an object, I just know how to grab it. Yang Fengyu: That’s because as a human being, you have been integrated so well that you don’t know that you have used tactile information in it. You have accumulated more than twenty years of tactile data, so you don't know which sense supported you to complete this task. This site: For most robot tasks, what is the difference in the contribution ratio of different senses? How high is the priority of touch at this stage? Yang Fengyu: For most robot tasks, the contribution proportions of different senses in the three steps of perception, reasoning, decision-making and action are different. At the perceptual level, in the early stage, we mainly relied on vision and point cloud to obtain global information, such as knowing the layout of the entire home, where the water is, etc. At present, the problem of perceiving global information through large visual models and 3D large models has been basically solved. At the decision-making level, language is mainly relied on to introduce human prior knowledge. For example, after receiving the instruction to get water from the refrigerator, the robot can break down the task and know the first step to open the refrigerator, the second step to get water, and the third step to close the refrigerator. This prior knowledge comes from a large amount of Internet data. On the action level, vision can help the robot determine the position of grasping, But in determining the grasping strength, tactile information plays an important role. For example, when there is occlusion, such as when holding tofu, it is difficult to accurately judge the grasping method through vision, but tactile sensation can provide key information to help the robot complete precise grasping. In addition, touch plays an important role in some scenes with fine force control, such as pinching eggs, grabbing tofu, etc., as well as in some scenes that require judgment of object deformation and force feedback.
In general, the contribution ratio of different senses varies depending on the task. In the grasping of some rigid objects, vision may account for a higher proportion; while in the grasping of many flexible objects, the role of touch is more critical, and even It can be said that it is a qualitative improvement from being basically unable to complete the task to being able to complete the task. This site: Are there high enough barriers to touch? What are the difficulties in implementing it into robotic products? Yang Fengyu: I think it is relatively high. Before 2023, touch has always been a very niche modality. Compared with vision and hearing, there are very few people engaged in touch-related work. In the early days of haptic-related work, sensors were the biggest problem. At that time, there were not many people engaged in data-related work in the world, and how to make sensors was a key issue. Secondly, there is the issue of how to parse tactile information, which involves both algorithm and data levels. At the data level, most of the specific data of tactile sensing in the world has not been made public before. This may be due to the particularity of the combination of many robots or other reasons, which makes the data disclosure in the field of robotics less than that in the field of vision. Therefore, we continue to solve the problem of data sets and are committed to promoting the continuous disclosure of tactile sensing data sets around the world. At the algorithm level, there are differences between touch and vision, which contain a lot of prior knowledge of physics. For example, the force situation can be judged through markers on the sensor, but this information is not as easy to interpret and identify as visual information. An experiment was also conducted at that time, and the results showed that the generated tactile signals were very difficult for people to distinguish. Because it is difficult for people to distinguish the tactile sensing signals of each thing without some specific training. We are also actively working to lower this barrier and promote more people in the academic community to participate in it to promote the development and progress of the entire tactile field. This site: If tactile information not only faces the problem of small amount of existing data, but also the high cost of large-scale collection, then how to scale up? Yang Fengyu: The work we did before was actually to try to solve this problem, how to scale up when large-scale collection is difficult to achieve: The first step is to combine vision and touch Get through, predict tactile sensations through vision, and even use visual and language information to infer tactile signals in scenes without tactile collection. For example, after collecting the tactile information of tables of the same type and material, in a new home or office scene, even if you have not actually touched the new table, you can infer its tactile signals through visual and verbal information . In this way, we can expand the available data set even without real physical contact. However, this method may be somewhat different from the real signal because it is predicted.
Second, we continue to promote the disclosure of tactile data sets. By making the data set public, more people can participate in the research and development of the haptic field, thus promoting the progress of the entire field. Third, at the algorithm level, we strive to lower the threshold for tactile information recognition. For example, by adding markers to the sensor and discovering how the markers change when subjected to different forces, we can use these prior knowledge of physics to better parse tactile information. Fourth, we are committed to combining different information, such as visual, tactile, language and other multi-modal information, to complete various tasks. Through the fusion of multi-modal information, the lack of small amount of tactile data can be compensated to a certain extent and the generalization ability and adaptability of the model can be improved. This site: Is large-scale collection possible and what conditions are required? Yang Fengyu: I think this is actually the bottleneck of the entire development of embodied intelligence. I personally think that large-scale collection can be achieved, but there is a commercialization process here. When robots enter thousands of households, when there is a certain amount, you can collect enough data to support more scenarios and make some generalizations. Of course, you can't capture every point forever, so the proposition of "large scale" will always exist. The essence of machine learning is to achieve simulation fitting and prediction of dense distribution through sparse sampling. In terms of data, we do not exclude simulation, but I think a certain amount of real machine data is a necessary condition for realizing embodied intelligence. This site: What are the key technical indicators of the tactile large model? Yang Fengyu: Like any large model, the tactile large model has some indicators in different downstream tasks. I led the team to build the world's largest existing visual and tactile data set, Touch and Go, which is one of the important common benchmarks for robot visual and tactile pre-training models in the world.
Embodied intelligent robot Wanda Start mass production in September
This site: After you decide to start a business, what kind of embodied intelligence company do you plan to build? Yang Fengyu: The essence of entrepreneurship is to create value for society. UniX AI is one of the few embodied intelligent robot companies in the world that sets the C-side as its first strategy. TO C Although there is a long way to go, the potential behind it is huge. From an industrial perspective, humanoid robots have entered a period of technological integration of hardware + AI, developing rapidly and becoming more and more practical. And I am optimistic that this integration process will be much faster than originally expected by industry insiders. The aging population, low birth rate, labor shortage...these are problems facing the world. The responsibility of an enterprise is to solve problems for society. This is the opportunity and value of UniX AI, and it is also my original intention to start a business. The current rough landing path of this track is basically industry-commercial-home. We will cover business and home, which is also the main scenario for serving TO C users. UniX AI’s vision is Robots For All, to create universal humanoid robots that are leading in terms of athletic ability and intelligence, enabling physical labor and intelligent companionship. This site: Why did you choose to do family scenes in the first place? Yang Fengyu: In fact, we are not limited to family scenes, we also do pan-commercial scenes, such as offices, etc. To B scenario is technically relatively less difficult, has a high repetition rate, and does not have such high requirements for generalization. However, To B scenarios often involve strong substitution logic, which places very high requirements on the speed and operation accuracy of the robot. Family scenes are complex and ever-changing. Every home is a small ecosystem, which requires robots to have strong generalization capabilities. This of course places higher demands on our products. At the same time, we will also have many L2-level functions in home scenarios, which will further improve the product’s adaptability and playability in complex scenarios.
In general, our technology stack can cover both To B and To C. Once the family scene is done well, I feel like I can handle other scenes with ease. Starting from the hardest bones not only reflects the technical strength of UniX AI, but also represents our strategic path to enter the market.
Yang Fengyu: UniX AI のモジュラー ハードウェア ソリューションは、あらゆるシナリオに対応できるわけではありません。同時に、認識と操作を分離してデータを最大限に活用する一連のモーション プリミティブ アルゴリズムを備えており、シーンへの移植性は非常に強力になります。どの製品にも限界はありますが、私たちはさまざまなシーンでの展開に挑戦していきたいと考えています。また、消費者を支援するためにいくつかの重要なビジネス シナリオも実行しています。 このサイト: いわゆるサプライチェーンのコストメリットとは何ですか? Yang Fengyu: 私たちのチームには、量産レベルのコスト管理手法を習得し、それをロボットのサプライ チェーンに適用できる、経験豊富なサプライ チェーン管理の専門家のグループがいます。ロボット業界ではまだ大規模な価格設定は行われていませんが、当社では当初から量産レベルでのコスト管理を行い、消費者に受け入れられる価格を実現しています。効果的なコスト管理により、当社の製品は価格競争力に優れ、企業の発展を強力にサポートできるものと確信しています。 このサイト: 今後発売される商品の価格帯はいくらですか? ヤン・フェンユー: 今これを公開するのは都合が悪いですが、非常に驚くべき価格であることは保証します。 このサイト: 最後までどうやって行くつもりですか? Yang Fengyu: 最終的なアプローチのロジックは非常に単純です。 一定量の高品質の実際のデータが必要です。鍵はこのデータをどうやって取得するかにあります 例えば、テスラの自動運転を例にとると、車を道路上で継続的に走らせてデータを収集するのに6〜8年かかりました。 ロボット業界は、ロボットが自動的に何かを行うことを期待しています。まず、誰もが「便利」「楽しい」と感じて、消費能力の範囲内で誰もが購入したくなるようなワンポイントシーン機能をいくつか開発しました。 私たちのサプライチェーンには利点があり、価格を下げることができ、これは非常に重要なポイントです。 ユーザーからの継続的なフィードバックを通じて、製品の最適化と反復を続け、最終的には普遍的な身体型インテリジェントロボットを作成します。 ヤン・フェンユー: デモを作るのは実はとても簡単で、実験室で作ってしまえば成功です。量産の難しさは、1 台ではなく、100 台または 1000 台のユニットが実際にユーザーの家に入り、製品のデータ セキュリティ、動作の安定性、および基礎となる制御の信頼性をテストするという事実にあります。これには、強力なアフターセールス チームと継続的な体制が必要です。技術チーム。さらに、プロセスも非常に重要であり、量産能力をテストするための重要な指標でもあります。 もちろん、それがサプライチェーンの競争力を反映する一方で、テクノロジーの成熟度を示すことには疑いの余地はありません。初めてカニを食べた人は誰ですか?早く上手に食べる人は誰ですか?さらに、大量生産により一定の先行者利益が得られる可能性があります。 当サイト:起業を決めた後、最初に考えたチームビルディングの考え方と現在のチーム編成状況は? ヤン・フェンユー: 0-1から、先発チームは非常に重要です。私は、最初に最上位で物事を計画し、それを滝のように上から下にゆっくりと各レベルに展開することに慣れています。まず中核となるキーパーソンを見つけて取り組みを開始し、次に下方に拡張してチームを継続的に改善し、全体を回転させます。 昨年末から現在まで、私たちのチームは非常に急速に成長し、製品を 3 世代にわたって繰り返してきました。現在、チームの規模は具体化し始めていますが、今後もニーズに応じて調整と改善を続け、会社の競争力を高めていきます。 人材の獲得は、スタートアップ企業にとって最も重要なことの 1 つです。私は社内のほとんどの人材に個人的に会いました。 多くの場合、CEO は CEO であるだけでなく、私たちが行っていることの価値と重要性を同僚に説明する必要があります。 彼らに同意して一緒に道を歩ませることが非常に重要です。 同時に、この段階では、私の管理範囲は非常に大きく、管理の粒度も非常に細かくなります。非常に難しいですが、必要です。会社の方向性が正しく安定していることを包括的に把握し、確認して初めて、他の側面に多くの時間を費やすことができます。 このサイト: どうやってこれらの才能を引き付けるのですか? ヤン・フェンユー: 本質的に誰もが魅了されるのは、身体化された知性の終焉への道、そしてそれをどのように行うかという問題です。 私たちにはいくつかのハイライトがあります。まず、私たちのチームは強力な実行力と非常に速い反復速度を持っています。それは問題ありませんでしたが、数週間後に戻ってきたら、シーンはすでに完成しており、進捗は非常に速かったことがわかりました。また、国内のトップロボット企業からも積極的に参加を申し出てきた優秀な人材もいます。 このサイト: 外部からの資金調達計画はありますか? ヤン・フェンユー: 投資家からの現在のフィードバックは非常に好意的です。私たちは、普遍的な身体化されたインテリジェンスのビジョンを共有し、長期的に私たちと協力してくれる投資家を歓迎します。 このサイト: 今後の製品と将来の市場計画について詳しく教えていただけますか? ヤン・フェンユー: 私たちが量産しようとしているロボットはワンダと呼ばれるもので、車輪付きの人型双腕ロボットです。私たちがリリースした最初の技術ビデオでは、その機能の一部をご覧いただけますが、これがすべてではありません。9 月に消費者向けに発売する際には、さらに驚くべき詳細が発表される予定です。 最終的に、UniX AI が消費者に提供したいと考えている製品は、家族に役立つだけでなく、人々をより多くの場所に同行させ、より多くの機能を提供できる、普遍的な身体化されたインテリジェント ロボットです。技術的な開発も必要ですが、企業とユーザーとの協働共創も必要となります。小さな一歩を踏み出さなければ千マイルに到達することはできません。最初の一歩から始めましょう。
もっと見る Tao
|Wang Changhu|
Xia Lixue | Gao Jiyang|Demi Guo|
著者への連絡先: jjingl- (追加の際は名前、会社名、役職を明記してください)
The above is the detailed content of Post-2000 CEO Yang Fengyu: A Yale doctor returned to China to start a business and built the first “mass-produced” humanoid robot in five months | AI Pioneer. For more information, please follow other related articles on the PHP Chinese website!