search
HomeTechnology peripheralsAIWatch videos, draw CAD, and recognize motion imagery! 75B's large multi-modal industrial model is so capable

The focus of this year’s upgrade is the introduction of multi-modal large model capabilities.

As the video and musical compositions created by Sora and Suno spark an audiovisual revolution around the world, how will large-scale multimodal applications in industry evolve? On March 27, as China's leading "AI manufacturing" solution provider, Innovation Qizhi unveiled their forward-looking answer.

After half a year of hard work, Innovation Qizhi released the more powerful Qizhi Haiming Industrial Large Model 2.0 version (AInno-75B) at a press conference held in Beijing. Several large-model native applications also made their debut, including ChatVision, ChatCAD, and ChatRobot was also upgraded to the Pro version.

Watch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

##                                                                                                                                                                                                                       to innovate Qizhi CTO Zhang Faen at the press conference

The application of Scaling laws is helpful for research People and engineers predict the performance gains from increasing model size and the number of parameters needed to achieve specific performance goals. At present, some consensus has been formed on the interface. Improving parameters can improve model performance. Compared with AInno-15B, AInno-75B has achieved significant growth in size and performance.

The focus of this year’s upgrade is the introduction of multi-mode large model capabilities. Zhang Faen explained that this advanced large model can handle multiple information modalities including text, pictures, and videos, and can even integrate data types unique to industrial scenarios, such as CAD drawings and EEG signals. Its output is equally diverse and can generate text, images, videos, CAD design drawings or tool body operation behaviors.

Watch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

1. ChatCAD: The beauty of industrial “Wen Sheng Diagram”

C-side AIGC application The pictures and videos generated are breathtaking, and in the field of enterprise services, AI generation capabilities are equally exciting.

Industrial design is the cornerstone of production activities. From mobile phones to new energy vehicle factories, industrial design should be completed before production and construction. As the foundation of industrial design, CAD software occupies an important position in the industrial chain. For a long time, my country's media CAD software market has been dominated by foreign manufacturers, with complex interfaces and high barriers to use.

Wang Xian, the general manager of operations of China Zhongyuan International Mechanical Engineering Co., Ltd., revealed that most of their design work relies on manual labor. A single building, whether it is a standard floor or a complex, requires designers to draw it one by one. The same goes for industrial drawings, which consume a lot of manpower and material resources. In addition, there are many industry specifications and frequent revisions, which further increases the difficulty of design.


In order to break this situation, Chuangxinqizhi took the lead in introducing industrial large model technology into the field of industrial design and launched a Text-to-CAD application - "ChatCAD": through a simple dialogue and question-and-answer format, you can quickly understand Based on the designer's creative intention, industrial design drawings that meet the requirements are automatically generated and exported to traditional software for fine-tuning.

Enter "Help me design an industrial pulley. The parameters are as follows: the radius of the pulley is 6, the thickness is 5, the edge of the pulley protrudes outward by 0.8, the thickness of the protruding part is 0.5, the height of the central axis of the pulley is 5, and the radius is 4 ". ChatCAD generates artwork immediately and continuously refines the design based on feedback.

Watch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable##                                                                          -                                                         Live demonstration of industrial pulley design Even in the face of lengthy and complex component design requirements, ChatCAD can handle it. For example, "Help me design a turbine. The turbine consists of a motor and an engine cover. The specific requirements are as follows: the motor is cylindrical, 20 in length, and 16 in diameter. The turbine consists of a cylindrical turbine shaft and 5 fan blades. The turbine shaft is 20 in length. The diameter is 12, the top of the turbine should have a cylindrical cone rotating shaft, the shaft cap length is 9, the diameter is 12, the engine cover has a diameter of 50, a length of 30, and the distance between the turbine blades and the engine cover is 1."

ChatCAD still generates results and continues to improve based on feedback. The designs generated by ChatCAD also support mainstream file formats and can be seamlessly connected to other industrial software to facilitate subsequent integration and modification.

Watch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

Live demonstration of turbine design

This feature makes Mr. Wang very excited. He believes that ChatCAD is expected to help the industry reduce repetitive labor and avoid rigid specification restrictions, thereby affecting the manual quotation of the entire industry.

So, how is ChatCAD implemented? Zhang Faen explained that CAD is different from common modalities such as text, pictures, and videos. It needs to represent geometric data such as points, lines, edges, circles, columns, and processes. "So we also call it a modality, which is a modality that the C side does not have. We need to invent our own intermediate language to express CAD, generate this intermediate language or intermediate code for large models, and then translate these intermediate codes into CAD .”Watch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

##                                                                                                                                                                                                                                 ha hung ▲ ▲ △ ▲ ▲ to to be ? It can be used directly for processing, but complex designs still need to be perfected. The goal of ChatCAD is to become a right-hand assistant for engineers in design institutes. It is expected to shorten the design process that originally took ten hours to one hour, with the large model responsible for 90% of the work and the remaining 10% being optimized manually. It is worth mentioning that Chuangxinqizhi has successfully integrated advanced large model technology into various industrial software such as CAD, MES, and BI, realizing a comprehensive integration of "R&D design-production control-information management". Intelligent transformation and upgrading of processes.

2. ChatVision: A new tool for industrial safety supervision

Factory production safety and compliance are crucial, video surveillance and image analysis are indispensable lack. Take wave soldering in a board card factory as an example. When workers clean 280-degree high-temperature tin furnaces, if they do not strictly wear safety protective equipment, such as air-tight activated carbon masks, high-temperature protective gloves, etc., there is a risk of serious burns. Traditional monitoring methods are inefficient, easy to miss hidden dangers, and have obvious lag in subsequent inspections. Based on the Alnno-75B industrial large model, ChatVision can analyze surveillance video streams, video files and pictures in real time through natural language, accurately identify non-compliant behaviors, and immediately trigger the alarm system (such as automatically sending emails to administrators) to help industrial enterprise security Production.

In the on-site demonstration at the press conference, ChatVision accurately responded to comprehensive understanding commands such as "Look carefully at the current screen and tell me where this might be", as well as "Find the power socket in the screen", "Find the white Specific target recognition tasks such as "hard hat" have demonstrated its broad application prospects.

##                                                                                                                                                                                Since the live demonstration, CWatch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

hatVision finds the power socket in the screen", "Find out "White hard hat" and other specific targets. These instructions seem very simple. Without a large model, they need to be developed for each small recognition category (such as hard hat, smoking) The specific algorithm is difficult to modify after debugging and deployment, and the implementation cost is high and the cycle is long. The emergence of large models subverts the traditional paradigm. A single large model can cover the functions of multiple small models, surpassing it in terms of performance, accuracy, and generalization capabilities. , and supports natural language interaction, which greatly simplifies the development and deployment process. During the live demonstration, the screen changed: one colleague took off his work hat to play with his mobile phone, and another colleague took off his safety clothing. The demonstrator issued Instruction: "Please analyze this screen carefully. If there are any violations, send an email to the administrator." This instruction is very knowledge-intensive. It not only involves the judgment of violations, but also determines whether to trigger email sending and the recipients. . This is the typical service model of large-model native applications. As a result, ChatVision called many security monitoring skills in the background to identify, not only marked three violations, but also sent an email with screenshots.

                                                                      There is a clear demonstration in the officially released ChatVision DEMOWatch videos, draw CAD, and recognize motion imagery! 75Bs large multi-modal industrial model is so capable

The ChatVision demonstration fully reflects the planning and reasoning capabilities of industrial large models. It can convert user intentions into a series of external tool calls to complete complex video understanding tasks in an orderly manner. Zhang Faen, CTO of Innovation Qizhi, said that the company has accumulated more than 200 visual algorithms and model assets in the past few years, and Industrial large models have opened up new horizons for the application of these assets. Large models can not only act as intelligent orchestrators to optimize user experience, but their multi-modal capabilities can also enhance video understanding and play a significant role in the field of enterprise security.

The last demonstration case highlights the cutting-edge application of large models in the multi-modal field. Faced with a real workshop video, the demonstrator put forward a difficult requirement: "Please analyze this video carefully, tell me whether anyone is eating and mark the time when this action occurred." This task requires a large model to perform continuous action recognition on long-term sequence images and mark the start and end times of the actions. As a result, ChatVision accurately located the scene where workers were eating within the first 15 seconds of the video.

"Eating is a very common event, and the ability of large models to understand events is far better than traditional small algorithm models." Zhang Faen explained. For a long time, there has been an urgent need to ensure production and engineering safety through video. In the future, related work around large models will be expected to achieve intelligent video understanding of production safety conditions and production process compliance.

In Wang Xian’s view, safety is always the top priority in engineering projects. For many years, engineering safety training rarely involves on-site hazard identification. He believes that ChatVision has broad application prospects, and it is expected to be implemented in on-site safety helmet detection, high-altitude safety rope wearing, safety equipment carrying and other scenarios. ChatVision also has great potential in the supervision industry. Currently, many on-site safety inspections still rely heavily on manpower.

3. ChatRobot Pro: "Motion Imagination Recognition"

AInno-15B's native application ChatRobot has implemented voice control of industrial robots. Just tell ChatRobot "Bring me a cup of coffee", and it can direct the industrial robot arm to search for coffee on the shelf and design its own route to deliver the goods to you. ChatRobot Pro can process more complex information carrier EEG signals.

At the press conference, the demonstrator randomly selected a product (Uniform Green Tea) and asked a person with multiple electrodes fixed on his scalp to use his motor imagination to control an industrial robot to put the drink into his hands. The man wearing the collector is trying to think of three things: left, right, and selection. The cursor also moves left and right on the screen based on the signals translated by the large model. When the cursor moves to the target icon, he will stare at the icon and click the cursor to select it.

Next, ChatRobot Pro will independently complete the intelligent orchestration of tasks, generate executable task steps, and interact with the industrial robot interface in real time to instruct the robot to complete the task.

EEG signals are signals generated during brain activity. The relationship between brain activity and EEG signals is very complex, and how to decode it has become a major problem for researchers. While traditional approaches have low accuracy, AInno-75B shows potential for interpreting this type of multimodal information. Some foreign brain-computer interface technologies use invasive electrodes to obtain EEG signals, which involves a series of engineering issues such as electrode design, surgical implantation, rejection reaction, signal transmission, and signal decoding. Innovation Qizhi uses non-invasive EEG caps to collect EEG information, which greatly reduces the engineering difficulty.

However, Zhang Faen also said that the invasive method can obtain more channels and clearer EEG signals, which will facilitate subsequent decoding of more complex brain intentions. A vivid metaphor is: the invasive method of collecting EEG signals is like listening to a concert inside a stadium, while the non-invasive method is like listening to a concert outside the stadium. There will be a big difference in the clarity of the singing. Currently, the research and development work that Innovation Qizhi is doing is to verify the multi-modal capabilities of large industrial models and conduct technical pre-research for possible future brain-controlled industrial automation scenarios.

This is also an end-to-end native application, Zhang Faen emphasized. The entire process from EEG signal input to direct output of the final result (a robotic arm delivering the goods to the demonstrator) is completed by the neural network, without relying on hand-designed features or traditional data processing.

In addition to natural language interaction and motor imagination recognition, ChatRobot Pro also makes full use of the reasoning capabilities of industrial large models to realize long sequence task orchestration and complex decision-making. Giving powerful intelligent control and decision-making capabilities to different bodies (whether it is industrial robotic arms or AGVs, etc.) will also be the future direction of the innovative Qizhi Industrial large model.

4. Continue to evolve and move forward

In the era of generative AI, there is no precedent for industrial application, and innovation and wisdom have always been Explore the possibilities in industrial scenarios.

Zhang Faen calls the prospect of large models in the direction of enterprise services “Promising”. But he admitted that during the window period of technological change, everyone's understanding is often uneven, especially for relatively large changes. People's understanding needs time to follow up, and he is no exception.

In addition to the new native applications, the overall performance and effect of ChatDOC released last year have been improved, and the product functions have become more complete. ChatBI has added support for Excel and CSV data, and now the accuracy of generating SQL statements and analysis reports has increased by 15%. Large model serving engines are easier to deploy and provide higher inference performance.

"Innovation Qizhi will further polish the ChatX application built directly based on the core generation capabilities of industrial large models." Zhang Faen said.

The above is the detailed content of Watch videos, draw CAD, and recognize motion imagery! 75B's large multi-modal industrial model is so capable. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
DSA如何弯道超车NVIDIA GPU?DSA如何弯道超车NVIDIA GPU?Sep 20, 2023 pm 06:09 PM

你可能听过以下犀利的观点:1.跟着NVIDIA的技术路线,可能永远也追不上NVIDIA的脚步。2.DSA或许有机会追赶上NVIDIA,但目前的状况是DSA濒临消亡,看不到任何希望另一方面,我们都知道现在大模型正处于风口位置,业界很多人想做大模型芯片,也有很多人想投大模型芯片。但是,大模型芯片的设计关键在哪,大带宽大内存的重要性好像大家都知道,但做出来的芯片跟NVIDIA相比,又有何不同?带着问题,本文尝试给大家一点启发。纯粹以观点为主的文章往往显得形式主义,我们可以通过一个架构的例子来说明Sam

阿里云通义千问14B模型开源!性能超越Llama2等同等尺寸模型阿里云通义千问14B模型开源!性能超越Llama2等同等尺寸模型Sep 25, 2023 pm 10:25 PM

2021年9月25日,阿里云发布了开源项目通义千问140亿参数模型Qwen-14B以及其对话模型Qwen-14B-Chat,并且可以免费商用。Qwen-14B在多个权威评测中表现出色,超过了同等规模的模型,甚至有些指标接近Llama2-70B。此前,阿里云还开源了70亿参数模型Qwen-7B,仅一个多月的时间下载量就突破了100万,成为开源社区的热门项目Qwen-14B是一款支持多种语言的高性能开源模型,相比同类模型使用了更多的高质量数据,整体训练数据超过3万亿Token,使得模型具备更强大的推

ICCV 2023揭晓:ControlNet、SAM等热门论文斩获奖项ICCV 2023揭晓:ControlNet、SAM等热门论文斩获奖项Oct 04, 2023 pm 09:37 PM

在法国巴黎举行了国际计算机视觉大会ICCV(InternationalConferenceonComputerVision)本周开幕作为全球计算机视觉领域顶级的学术会议,ICCV每两年召开一次。ICCV的热度一直以来都与CVPR不相上下,屡创新高在今天的开幕式上,ICCV官方公布了今年的论文数据:本届ICCV共有8068篇投稿,其中有2160篇被接收,录用率为26.8%,略高于上一届ICCV2021的录用率25.9%在论文主题方面,官方也公布了相关数据:多视角和传感器的3D技术热度最高在今天的开

复旦大学团队发布中文智慧法律系统DISC-LawLLM,构建司法评测基准,开源30万微调数据复旦大学团队发布中文智慧法律系统DISC-LawLLM,构建司法评测基准,开源30万微调数据Sep 29, 2023 pm 01:17 PM

随着智慧司法的兴起,智能化方法驱动的智能法律系统有望惠及不同群体。例如,为法律专业人员减轻文书工作,为普通民众提供法律咨询服务,为法学学生提供学习和考试辅导。由于法律知识的独特性和司法任务的多样性,此前的智慧司法研究方面主要着眼于为特定任务设计自动化算法,难以满足对司法领域提供支撑性服务的需求,离应用落地有不小的距离。而大型语言模型(LLMs)在不同的传统任务上展示出强大的能力,为智能法律系统的进一步发展带来希望。近日,复旦大学数据智能与社会计算实验室(FudanDISC)发布大语言模型驱动的中

百度文心一言全面向全社会开放,率先迈出重要一步百度文心一言全面向全社会开放,率先迈出重要一步Aug 31, 2023 pm 01:33 PM

8月31日,文心一言首次向全社会全面开放。用户可以在应用商店下载“文心一言APP”或登录“文心一言官网”(https://yiyan.baidu.com)进行体验据报道,百度计划推出一系列经过全新重构的AI原生应用,以便让用户充分体验生成式AI的理解、生成、逻辑和记忆等四大核心能力今年3月16日,文心一言开启邀测。作为全球大厂中首个发布的生成式AI产品,文心一言的基础模型文心大模型早在2019年就在国内率先发布,近期升级的文心大模型3.5也持续在十余个国内外权威测评中位居第一。李彦宏表示,当文心

AI技术在蚂蚁集团保险业务中的应用:革新保险服务,带来全新体验AI技术在蚂蚁集团保险业务中的应用:革新保险服务,带来全新体验Sep 20, 2023 pm 10:45 PM

保险行业对于社会民生和国民经济的重要性不言而喻。作为风险管理工具,保险为人民群众提供保障和福利,推动经济的稳定和可持续发展。在新的时代背景下,保险行业面临着新的机遇和挑战,需要不断创新和转型,以适应社会需求的变化和经济结构的调整近年来,中国的保险科技蓬勃发展。通过创新的商业模式和先进的技术手段,积极推动保险行业实现数字化和智能化转型。保险科技的目标是提升保险服务的便利性、个性化和智能化水平,以前所未有的速度改变传统保险业的面貌。这一发展趋势为保险行业注入了新的活力,使保险产品更贴近人民群众的实际

致敬TempleOS,有开发者创建了启动Llama 2的操作系统,网友:8G内存老电脑就能跑致敬TempleOS,有开发者创建了启动Llama 2的操作系统,网友:8G内存老电脑就能跑Oct 07, 2023 pm 10:09 PM

不得不说,Llama2的「二创」项目越来越硬核、有趣了。自Meta发布开源大模型Llama2以来,围绕着该模型的「二创」项目便多了起来。此前7月,特斯拉前AI总监、重回OpenAI的AndrejKarpathy利用周末时间,做了一个关于Llama2的有趣项目llama2.c,让用户在PyTorch中训练一个babyLlama2模型,然后使用近500行纯C、无任何依赖性的文件进行推理。今天,在Karpathyllama2.c项目的基础上,又有开发者创建了一个启动Llama2的演示操作系统,以及一个

腾讯与中国宋庆龄基金会发布“AI编程第一课”,教育部等四部门联合推荐腾讯与中国宋庆龄基金会发布“AI编程第一课”,教育部等四部门联合推荐Sep 16, 2023 am 09:29 AM

腾讯与中国宋庆龄基金会合作,于9月1日发布了名为“AI编程第一课”的公益项目。该项目旨在为全国零基础的青少年提供AI和编程启蒙平台。只需在微信中搜索“腾讯AI编程第一课”,即可通过官方小程序免费体验该项目由北京师范大学任学术指导单位,邀请全球顶尖高校专家联合参研。“AI编程第一课”首批上线内容结合中国航天、未来交通两项国家重大科技议题,原创趣味探索故事,通过剧本式、“玩中学”的方式,让青少年在1小时的学习实践中认识A

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools