Home >Technology peripherals >AI >The Turing giant appeared at ICLR and went crazy for stars LeCun and Bengio at the summit! Three major technology trends of Chinese teams set off new imagination of AGI
In the past few days, the grand conference of AI - ICLR was held in Vienna.
OpenAI, Meta, Google, Zhipu AI and other world-leading AI technology companies gathered together.
The venue is crowded with celebrities and the stars are dazzling. If you walk a few steps, you can bump into a celebrity who has published a subversive paper.
Unsurprisingly, the ICLR 2024 exhibition hall has also become a star-chasing scene. The lively atmosphere almost blew the roof off.
On-site star chasing Turing giants
LeCun, the famous "e man" among the three Turing giants, was on X in advance He generously announced his schedule and was looking forward to meeting his fans.
In the comment area, not only were fans excited to check in, but some were even ready to submit their resumes on the spot.
The fans' trip was indeed worthwhile. At the scene, LeCun explained eloquently, and the enthusiastic audience formed a dense circle around him.
Getting back to the topic, throughout the ICLR event, the Meta team will share more than 25 papers and two seminars. This time, the LeCun team published the following two papers on ICLR.
Paper address: https://arxiv.org/abs/2305.19523
Paper address: https://arxiv.org/abs/2311.12983
Another Turing giant, Yoshua Bengio, also showed his high popularity.
The audience concluded: "A person really needs to be unique in his field to have such a long queue outside his conference room!"
LeCun and Hinton have both expressed strong opinions on this before. Bengio's attitude seems to have been relatively vague. I can't wait to know what he thinks of AGI. On the coming May 11, he will give a speech at a Workshop on AGI.
It is worth mentioning that the Bengio team also received an honorable mention for outstanding paper at this year’s ICLR.
##Paper address: https://openreview.net/pdf?id=Ouj6p4ca60
Next door to Google Meta, Zhipu AI is also present at, where Google’s open source model Gema, the framework behind robotic agents Robotics Transformers, and other groundbreaking research are presented.
Next to Meta and Google, there is a very eye-catching company in the middle of the exhibition hall - Zhipu AI.
The children’s shoes on site are introducing a series of research results such as GLM-4 and ChatGLM.
This series of displays attracted the attention of many foreign scholars.
The nearly two thousand guests and scholars at the scene listened carefully to the introduction of the GLM large model technical team.
The introduction includes a number of cutting-edge research results on the GLM series of large models, covering fields such as mathematics, Vincentian diagrams, image understanding, visual UI understanding, and Agent intelligence.
At the scene, everyone had a heated discussion about their views on Scaling Law. The GLM team also has unique insights into this -
"Compared to model size or training calculation amount, intelligent emergence and pre-training loss are more closely related."
For example, the famous OpenAI 996 researcher Jason Wei expressed his admiration after carefully reading the Zhipu AI paper on pre-training loss.
In the paper, the team evaluated its performance on 12 Chinese and English data sets by training 30 LLMs with different parameters and data sizes.
Paper address: https://arxiv.org/abs/2403.15796
Results observed , LLM will have emergent ability only when the pre-training loss is lower than a certain threshold.
Moreover, defining "emergent ability" from the perspective of pre-training loss is better than relying solely on model parameters or training volume.
The performance of Zhipu AI has also made more and more foreign netizens realize——
Tanishq, research director of Stability AI who received a PhD at the age of 19, said that the most competitive open source basic models such as CogVLM, which have made significant contributions to the open source ecosystem, come from China.
The former CEO of the game studio started using CogVLM and Stable Diffusion to make a complete open source version last year.
Yes, since CogVLM was released, its powerful capabilities have caused foreign netizens to exclaim.
In the LLM rankings in January this year, someone also discovered——
At that time, Gemini and GPT-4V were far ahead of any open source LLM, with the only exception being CogVLM.
It can be seen that with this wave of large-scale domestic models going overseas, Zhipu AI has quietly established its own huge influence abroad.
In addition to the wonderful demonstrations in the exhibition hall, this year's ICLR invited a total of seven special speakers to share their insights on AI.
There are Raia Hadsell, a research scientist from Google DeepMind, Devi Parik, associate professor at Georgia Institute of Technology & Chief Scientist of FAIR, and director from the Max Planck Institute for Computer Science (MPI-SWS) Moritz Hardt, the only Chinese team is the GLM large model technical team of Zhipu AI.
The title of Google DeepMind scientist Raia Hadsell's speech is - "Learning during the ups and downs of artificial intelligence development: Unexpected truths on the road to AGI ”.
After decades of steady development and occasional setbacks, AI is at a critical inflection point.
AI products have exploded into the mainstream market, and we have not yet reached the ceiling of scaling dividends, so the entire community is exploring the next step.
In this speech, based on more than 20 years of experience in the field of AI, Raia discussed our assumptions about the development path of AGI, how Change over time.
At the same time, she also revealed the unexpected discoveries we made during this exploration.
From reinforcement learning to distributed architecture to neural networks, they are already playing a potentially revolutionary role in the scientific field.
Raia believes that by learning from past experiences and lessons, important insights can be provided for the future research direction of AI.
#On the other side, FAIR chief scientist Devi Parik told everyone the story of her life.
As can be seen from the title of the speech, the content shared by Parik is extraordinary.
At the ICLR conference, when explaining why the technical environment is what it is now, everyone will focus on the development of the Internet, big data and computing power.
However, few people pay attention to those small, but important personal stories.
In fact, everyone’s story can be gathered into an important force to promote technological progress.
In this way, we can learn from each other and inspire each other. This makes us more tenacious and efficient in pursuing our goals.
Moritz Hardt, Director of the German MPI-SWS, brought "Emerging Scientific Benchmarks" ” speech.
Obviously, benchmark testing has become the "core pillar" in the field of machine learning.
Since the 1980s, although humans have made many achievements under this research paradigm, their deep understanding is still limited.
#In this talk, Hardt explores the fundamentals of benchmarking as an emerging science through a series of selected empirical studies and theoretical analyses. principle.
He specifically discussed the impact of annotation errors on data quality, external validation of model rankings, and the prospects for multi-task benchmarking.
At the same time, Hard also presented a number of case studies.
These challenge our conventional wisdom and highlight the importance and benefits of developing scientific benchmarks.
In China, the GLM large model technical team of Zhipu AI has also brought "ChatGLM to AGI" "Road" wonderful speech.
It is worth mentioning that this is also the "first time" in China that a keynote speech related to large models has been presented at a top international conference.
This speech will first introduce the development process of AI in the past few decades from a Chinese perspective.
At the same time, they used ChatGLM as an example to explain the understanding and insights they gained during practice.
##2024 AGI Preview: GLM 4.5, GLM-OS, GLM-zero
At ICLR, the GLM large model team introduced the three major technical trends of GLM for AGI.
Where is the only way to AGI?
The industry has mixed opinions on this. Some people think it is an intelligent agent, some people think it is multi-modal, and some people say that Scaling Law is a necessary but not sufficient condition for AGI.
But LeCun insists that LLM is a wrong road to AGI, and LLM cannot bring AGI.
In this regard, the team also put forward its own unique point of view.
First of all, they talked about the subsequent upgraded version of GLM-4, namely GLM-4.5 and its upgraded model.
The subsequent upgraded version of GLM-4 will be based on SuperIntelligence and SuperAlignment technologies, while making great progress in the field of native multi-modality and AI security. .
The GLM large model team believes that text is the most critical foundation on the road to AGI.
The next step is to mix text, images, video, audio and other modalities together for training to become a true "native multi-modal model".
At the same time, in order to solve more complex problems, they also introduced the concept of GLM-OS, a general computing system centered on large models.
This view coincides with the view of large-model operating systems previously proposed by Karpathy.
At the ICLR site, the GLM large model team introduced the implementation of GLM-OS in detail:
Based on the existing All-Tools capabilities, coupled with memory and self-reflection capabilities, GLM-OS is expected to successfully imitate the human PDCA mechanism, namely Plan-Do-Check-Act cycle.
Specifically, make a plan first, then give it a try to form feedback, adjust the plan, and then take action in order to achieve better results.
Relying on the PDCA cycle mechanism, LLM can self-feedback and evolve independently - just like humans do.
In addition, the GLM large model team also revealed that since 2019, the team has been studying a technology called GLM-zero, aiming to study human "unconscious" learning mechanisms.
"When people are sleeping, the brain is still learning unconsciously."
The GLM large model team said that "unconscious" learning Mechanisms are an important part of human cognitive abilities, including self-learning, self-reflection, and self-criticism.
There are two systems in the human brain, "feedback" and "decision-making", which respectively correspond to the LLM large model and memory.
Therefore, related research on GLM-zero will further expand human understanding of consciousness, knowledge, and learning behavior.
Although it is still in a very early research stage, GLM-zero can be regarded as the only way to AGI.
This is also the first time that the GLM large model team has disclosed this technology trend to the outside world.
At the end of 2020, the GLM large model technical team developed the GLM pre-training architecture.
In 2021, the tens of billions parameter model GLM-10B was trained, and in the same year, the converged trillions sparse model was successfully trained using the MoE architecture.
In 2022, they also collaborated to develop the Chinese-English bilingual 100-billion-level ultra-large-scale pre-training model GLM-130B and open sourced it.
In the past year, the team has completed an upgrade of the large base model almost every 3-4 months, and it has now been updated to the GLM-4 version.
Not only that, as the first domestic LLM company to enter the market, Zhipu AI has set an ambitious goal in 2023 - to benchmark OpenAI across the board.
The GLM large model technical team has built a complete large model product matrix based on the AGI vision.
In addition to the GLM series, there are also CogView grammatical model, CodeGeeX code model, multi-modal understanding model CogVLM, and then GLM-4V multi-modal large model and All-Tools Functions and AI assistant to clear words.
At the same time, the researchers of the GLM large model technology team have a very high influence in the industry.
For example, Li Feifei, who is very popular in the circle, teaches the CS25 course at Stanford University. Every time, she invites experts at the forefront of Transformer research to share her latest breakthroughs.
It has been confirmed that among the guests of the CS25 course, there are researchers from Zhipu AI.
CogVLM
The open source visual language model CogVLM developed by the team, once The release attracted industry attention.
A paper published by Stability AI in March showed that CogVLM was directly used by Stable Diffufion 3 for image annotation due to its excellent performance.
Paper address: https://arxiv.org/abs/2403.03206
CogAgent
On this basis, CogAgent, an open source visual language model improved based on CogVLM, is mainly aimed at the user graphical interface GUI. understand.
The relevant papers of CogAgent have been included in CVPR 2024, the highest-level academic conference in the field of international computer vision.
You must know that CVPR is known for its strict admissions. This year's thesis acceptance rate is only about 2.8%.
Paper address: https://arxiv.org/abs/2312.08914
ChatGLM-Math
#To solve mathematical problems with LLM, the GLM large model team proposed the "Self-Critique" iterative training method.
Through the self-feedback mechanism, it helps LLM improve both language and mathematics abilities.
Paper address: https://arxiv.org/abs/2404.02893
This method , including two key steps:
First train a "Math-Critique" model generated from LLM itself to evaluate the model to generate answers to mathematical questions and provide feedback signals.
Secondly, through rejection sampling fine-tuning and DPO, the new model is used to supervise the generation of LLM itself.
The GLM large model team also designed the MATHUSEREVAL benchmark test set to evaluate the mathematical capabilities of the new model. The results are as follows:
It is obvious that the new method significantly improves LLM’s mathematical problem-solving ability while still improving its language skills. Importantly, it outperforms larger models with twice the number of parameters in some cases.
In the OpenCompass 2.0 benchmark test, the strength of Zhipu AI’s new generation base large model Not to be underestimated.
In the overall ranking, GLM-4 ranks third and ranks first in the country.
In the "SuperBench Large Model Comprehensive Capability Evaluation Report" released by the SuperBench team not long ago, GLM-4 also ranked among the first tier in the world.
Especially in the most critical semantic understanding and agent capabilities, GLM-4 ranks first in the country, overwhelming all competitors.
In the first year of big models that has just passed, a lively model war has been going on for a year.
If 2024 is to be the first year of AGI, the world’s largest model teams still have a long way to go.
The above is the detailed content of The Turing giant appeared at ICLR and went crazy for stars LeCun and Bengio at the summit! Three major technology trends of Chinese teams set off new imagination of AGI. For more information, please follow other related articles on the PHP Chinese website!