Home >Technology peripherals >AI >Yang Fan, co-founder of SenseTime: New opportunities for AI industry development brought by the wave of large models
36 Krypton hosted an industrial development summit called "Disruption · AIGC" on May 23. This summit brings together industry forces to jointly discuss the response strategies of enterprises and industries when facing changes, share thoughts, explore and discover the most potential enterprises and the most valuable technologies in the industry, and explore the way forward in a turbulent environment. .
At the conference, Yang Fan, co-founder of SenseTime and president of the large device business group, delivered a keynote speech titled "New Opportunities for the Development of the AI Industry Brought by the Wave of Large Models." Yang Fan believes that the new wave of AI has two characteristics: first, the cycle from technological breakthrough to business model innovation is shorter, and technological achievements are more quickly used in commercial and industrial exploration and practice; second, compared with the past ten years, , the current industrialization of artificial intelligence makes it easier to transform technological advantages into data barriers and scale advantages.
Yang Fan also put forward his own views and explained the reasons for the breakthrough progress of artificial intelligence technology. He believes that although the success of large models still confirms the violent aesthetics of "data, computing power, and algorithms" of artificial intelligence, behind these three elements is actually a comprehensive system engineering. Taking OpenAI as an example, Yang Fan pointed out that how to do a good job in data engineering, how to improve the effective resource utilization of the chip, and how to design a lower-cost but well-structured algorithm, every link requires the support of expert experience and knowledge and system engineering capabilities. He believes that this is not only the ultimate manifestation of the core technical capabilities of the model layer enterprise, but also the key capability necessary to provide AI infrastructure services.
The following is the transcript of Yang Fan’s speech (organized and edited by 36 Krypton):
Hello everyone! I am honored to be able to share some industry trends of large models with you at the 36 Krypton event today.
In such a period of extreme changes in the industry, I would like to share a few views. First of all, when we talk about large models today, there is no precise definition. Is it larger than hundreds of billions or tens of billions? In my opinion, from 2012 to now, in the past ten years or so, the model structure of artificial intelligence has been getting larger, and the number of parameters has also been getting larger. Why does everyone seem to suddenly have a concept now, triggering more hot spots of attention? We can see that there is a strong correlation between new applications represented by AlphaGo in 2016 and individual consumers. In the past two years, artificial intelligence technology has made new progress and breakthroughs. First of all, these progress and breakthroughs are relevant to everyone. It is more directly related, and everyone can feel it directly. Secondly, these breakthroughs have indeed had a greater impact. I think artificial intelligence can complete some innovative work in other disciplines in the field of scientific research, whether it is biology, physics, chemistry, or Other areas, such as the ChatGPT model that everyone is paying attention to today, are very meaningful because it has the potential to drive our entire underlying technology and produce new progress. Such new progress is likely to bring more increments to mankind in the future.
Starting from 2021, more technological breakthroughs have been produced one after another. At the same time, we have seen a very interesting phenomenon. After this round of technological breakthroughs achieved certain results in technology, we began to explore and explore in industry and business. In practice, this cycle becomes shorter than before. After that, a large number of innovative companies were established at home and abroad, and professors and scholars started to start their own businesses. I think there may have been some paths to this in the market in the past, and investors’ recognition has become higher, including the announcement of some Vincentian APIs. After that, people soon started trying to become internet celebrities on Xiaohongshu.
We see many trends, from technological breakthroughs to commercial innovation, this cycle seems to be shorter. In some forums I have participated in recently, I found that most people are talking about what kind of large model they want to make, how big and powerful the model is, what they want to do with this model, and how to build it in certain specific scenarios. A super new APP, etc. While no major model in China has yet received formal API license from government supervision, there has been such a big expansion change in the past two months.
So I think this is a phenomenon worthy of our attention. We see that the commercialization process of this round of large models is faster. Why does this have such an effect? A very important point is that we see many new technologies that can do more C-side applications. At the same time, they can naturally form a closed loop of data accumulation, which makes it easier to establish business barriers than technical entrepreneurship in the past. I think this is a trend we've seen in the industry in recent months.
Yang Fan, co-founder of SenseTime and president of the large device business group
Second, it is the thing behind the large model technology we do today. Everyone has a consensus that whether it is a large model or looking back on the past 10 years, the development and changes of the entire artificial intelligence industry are basically the success of a violent aesthetic, including the traditional three elements of artificial intelligence: data, computing power, and algorithms. Algorithms can be understood as model structures. Today we call these large models, or models that have achieved newer technical results. Almost all models are used in every field, whether it is the scale of computing power used in the scale of the data set or the algorithm itself. The structure and the number of parameters of the model have actually maintained a very high growth rate. The Transformer model is very stable and very effective. It can solve problems in many fields and get good results. When we find that the amount of data is large enough to obtain very generalizable results, in a sense, it further proves that the general direction of the progress of artificial intelligence technology is to use violence to produce miracles, and integrating more resources can You can get better results.
However, just having such a resource is actually far from enough. Let’s look at the corresponding three elements. Before each element can form a good result, a large amount of professional engineering practice must be done in each field.
In fact, the guest’s speech just now explained why we need large computing power in the field of computing power. How can these large computing powers be connected? If there are 1,000 cards today, can we make them cost-effective and achieve an effective utilization rate of 60%, 80%, or even 90%? Or, if we connect 1,000, 2,000, or 4,000 cards today, what will be the effect? OpenAI previously connected 10,000 V100 cards. Currently, no one in China can connect 10,000 cards together to run the same training task and achieve an effective resource utilization rate of over 50% or 60%. Some people may be doing it now. , but there is no such result yet, why? Behind it is a very complex engineering event. For example: a model with hundreds of billions of parameters requires a large amount of data interaction and intermediate gradient information interaction during training. When you combine the large amount of data transmission and the transmission of operation results on thousands of GPU cards, an effective balance is formed. , many times the model is carried out between point-to-point, and two-to-two transmission is required in the network structure. When we connect thousands of cards together, what is the acceptable state of the effect? It’s actually not complicated at all. It’s just a lot of engineering practice. Just like if you have done this, you have stepped on enough If you do it well, you will be able to tune it better than others. This matter is a very important issue of experience.
The same is true for algorithms. Today’s algorithm structure design can be cheaper than the original one. If the structure is well designed, using fewer parameters and smaller data can achieve a final algorithm effect similar to that of a design without special optimization. There is also a lot of expert knowledge involved, not to mention the data.
When OpenAI was doing ChatGPT4, in the end, only a very small part of the collected data, maybe less than 10%, was used for training. This was a big gap between resource saving and full training. The amount of Internet data was very large. Which data is more effective and which data has higher embedded value? When we do training, there is actually a lot of trial and error in between which data should be thrown out first and which methods should be thrown out later. Why is there such a shortage of computing power and everyone needs more computing power? Because many people who make large models are trying and making mistakes, they may be divided into three or four groups at the same time to try and make mistakes in different directions, and then gradually make iterative optimizations. Violent aesthetics or large-scale resource gathering are what enable AI technology and AI algorithms today. The reason for continued acquisition.
A comprehensive system engineering requires expert experience and system engineering capabilities in every link. This also shows that OpenAI allows the best scientists to do data engineering instead of algorithms. This greatly exceeds our previous understanding of the field. In the future, this may become a key threshold and will also Become our core ability to provide services to the market.
Why after the emergence of new artificial intelligence technology, the industrial wave followed very quickly. We see that model services are naturally suitable for many fields. People in the Internet circle are very excited, and investors think it will grow as rapidly as the Internet. Changes in commercialization thresholds and barriers will bring some newer opportunities to large-scale models, but access to these opportunities depends on the different differences and expertise of individuals. In any case, compared to the past 10 years, today's artificial intelligence industrialization will have a very big advantage, because it is not a single technical barrier. Today's technical advantages may be transformed into data barriers and scale advantages. We believe that there will be more in the future. Many industrial applications.
SenseTime started making early large models in 2019. In our opinion, the entire AI model has been getting bigger and bigger, so we have accumulated a lot of internal capabilities, including self-developed some CV and NLP model. In April this year, SenseTime opened the APIs of some models for trial use by industry partners, including some large language models. In our view, this is more of the ultimate manifestation of the accumulation of core basic technical capabilities.
We have released a series of models this year. Behind the service support for the market are our large devices. We feel that as the entire artificial intelligence industry moves forward, someone needs to provide such large-scale and efficient infrastructure. This Basically it is an inevitable path. If the entire AI technology wave becomes a game that consumes more and more resources and accumulates expert experience in the future, the threshold is actually extremely high, which is not conducive to the rapid application of AI in the industry. Therefore, we judge that differentiation will inevitably form, and there will definitely be people Providing infrastructure services, whether it is in the form of calling model APIs, making small models on this basis, or in other ways, can quickly use basic AI resources and capabilities with low threshold and low cost, thus quickly Improve your own closed-loop business model.
The positioning of SenseTime is to be an AI infrastructure provider. Today we have the largest artificial intelligence computing node in Asia. We have more than 5000P resource computing power, and we also provide a lot of industry cooperation to allow partners to We are able to use their large models for training on large devices, which reflects SenseTime’s deep accumulation. Whether it is at the resource level or at the expert engineering cognitive level, part of our capabilities can be standardized and turned into software and services. We cannot For the standardized part, we can turn it into a professional classification service. We hope to package these capabilities and provide them to the entire industry to help customers make their own domain models or model applications.
Train AI large models using large SenseTime devices.
Source: 36氪
The above is the detailed content of Yang Fan, co-founder of SenseTime: New opportunities for AI industry development brought by the wave of large models. For more information, please follow other related articles on the PHP Chinese website!