Home >Technology peripherals >AI >DeepMind CEO: LLM+tree search is the AGI technology line. AI research relies on engineering capabilities. Closed-source models are safer than open-source models.

DeepMind CEO: LLM+tree search is the AGI technology line. AI research relies on engineering capabilities. Closed-source models are safer than open-source models.

PHPz
PHPzforward
2024-03-05 12:04:18378browse

Google suddenly switched to 996 mode after February, launching 5 models in less than a month.

And DeepMind CEO Hassabis himself has also been promoting his own products everywhere, exposing a lot of behind-the-scenes development insider information.

In his view, although technological breakthroughs are still needed, the road to AGI has now emerged for mankind.

The merger of DeepMind and Google Brain marks that the development of AI technology has entered a new era.

Q: DeepMind has always been at the forefront of technology. For example, in a system like AlphaZero, the internal intelligent agent can achieve the final goal through a series of thoughts. Does this mean that large language models (LLM) can also join the ranks of this kind of research?

Hassabis believes that large-scale models have huge potential and need to be further optimized to improve their prediction accuracy and thereby build more reliable models of the world. While this step is crucial, it may not be enough to build a complete artificial general intelligence (AGI) system.

On this basis, we are developing a planning mechanism similar to AlphaZero to formulate plans to achieve specific world goals through the world model.

This includes stringing together different chains of thinking or reasoning, or using tree searches to explore a vast space of possibilities.

These are the missing links in our current large-scale model.

Q: Starting from pure reinforcement learning (RL) methods, is it possible to move directly to AGI?

#It seems that large language models will form the basis of prior knowledge, and then further research can be carried out on this basis.

Theoretically, it is possible to completely adopt the method of developing AlphaZero.

Some people in DeepMind and the RL community are working in this direction. They start from scratch and do not rely on any prior knowledge or data to completely build a new knowledge system.

I believe that leveraging existing world knowledge - such as information on the web and data we already collect - will be the fastest way to achieve AGI.

We now have scalable algorithms that can absorb this information - Transformers. We can completely use these existing models as prior knowledge for prediction and learning.

Therefore, I believe that the final AGI system will definitely include today's large models as part of the solution.

But having a large model alone is not enough, we also need to add more planning and search capabilities to it.

Q: Faced with the huge computing resources required by these methods, how can we break through?

Even a system like AlphaGo is quite expensive due to the need to perform calculations on each node of the decision tree.

We are committed to developing sample-efficient methods and strategies for reusing existing data, such as experience replay, as well as exploring more efficient methods.

In fact, if the world model is good enough, your search can be more efficient.

Take Alpha Zero as an example. Its performance in games such as Go and chess exceeds the world championship level, but its search range is much smaller than traditional brute force search methods.

This shows that improving the model can make searches more efficient and thus reach further targets.

But when defining the reward function and goal, how to ensure that the system develops in the right direction will be one of the challenges we face.

Why can Google produce 5 models in half a month?

Q: Can you talk about why Google and DeepMind are working on so many different models at the same time?

Because we have been conducting basic research, we have a large amount of basic research work covering a variety of different innovations and directions.

This means that while we are building the main model track, the core Gemini model, there are also many more exploratory projects underway.

When these exploration projects achieve some results, we will merge them into the main branch into the next version of Gemini, which is why you will see 1.5 released immediately after 1.0, Because we're already working on the next version, and because we have multiple teams working on different timescales, cycling between each other, that's how we can continue to progress.

I hope this will become our new normal, releasing products at this high speed, but of course, but also being very responsible, keeping in mind that releasing safe models is our number one priority.

Q: I wanted to ask about your most recent big release, Gemini 1.5 Pro, your new Gemini Pro 1.5 model can handle up to one million tokens. Can you explain what this means and why context window is an important technical indicator?

DeepMind CEO:LLM+树搜索就是AGI技术线路,AI科研依赖工程能力,闭源模型就是比开源安全

Yes, this is very important. The long context can be thought of as the working memory of the model, i.e. how much data it can remember and process at one time.

The longer the context you have, the accuracy of it is also important, the accuracy of recalling things from the long context is equally important, the more data you can take into account and context.

So a million means you can handle huge books, full movies, huge amounts of audio content, like full code bases.

If you have a shorter context window, such as only one hundred thousand levels, then you can only process fragments of it, and the model cannot reason about the entire corpus that you are interested in. or search.

So this actually opens up possibilities for all kinds of new use cases that can't be done with a small context.

Q: I've heard from AI researchers that the problem with these large context windows is that they are very computationally intensive. For example, if you uploaded an entire movie or a biology textbook and asked questions about it, it would require more processing power to process all of that and respond. If a lot of people do this, the costs can add up quickly. Did Google DeepMind come up with some clever innovation to make these huge context windows more efficient, or did Google just bear the cost of all this extra computation?

Yes, this is a completely new innovation because without innovation you cannot have such a long context.

But this still requires a high computational cost, so we are working hard to optimize it.

If you fill up the entire context window. Initial processing of uploaded data may take several minutes.

But that’s not too bad if you consider that it’s like watching an entire movie or reading the entire War and Peace in a minute or two, and then you can answer any questions about it.

Then what we want to make sure is that once you upload and process a document, video, or audio, subsequent questions and answers should be faster.

That's what we're working on right now and we're very confident that we can get it down to a matter of seconds.

Q: You said you have tested the system with up to 10 million tokens. What is the effect?

worked very well in our tests. Because the computing cost is still relatively high, the service is not currently available.

But in terms of accuracy and recall, it performs very well.

Q: I want to ask you about Gemini. What special things can Gemini do that previous Google language models or other models couldn't do?

Well, I think what's exciting about Gemini, especially version 1.5, is that it's inherently multimodal and we built it from the ground up to be able to handle anything Types of input: text, image, code, video.

If you combine it with long context, you can see its potential. For example, you can imagine that you are listening to an entire lecture, or that there is an important concept you want to understand and you want to fast forward to there.

So now we can put the entire code base into a context window, which is very useful for new programmers getting started. Let's say you're a new engineer starting work on Monday. Typically you have hundreds of thousands of lines of code to look at. How do you access a function?

You need to ask the experts on the code base. But now you can actually use Gemini as a coding assistant, in this fun way. It will return some summary telling you where the important parts of the code are, and you can start working.

I think having this ability is very helpful and makes your daily workflow more efficient.

I'm really looking forward to seeing how Gemini performs when integrated into something like slack, and your general workflow. What will the workflow of the future look like? I think we're just starting to experience the changes.

Google’s top priority for open source is security

Q: I’d like to turn now to Gemma, a series of lightweight open source models you just released. Today, whether to release underlying models through open source, or keep them closed, seems to be one of the most controversial topics. Until now, Google has kept its underlying model closed source. Why choose open source now? What do you think of the criticism that making underlying models available through open source increases the risk and likelihood that they will be used by malicious actors?

Yes, I have actually discussed this issue publicly many times.

One of the main concerns is that open source and open research in general are clearly beneficial. But there is a specific problem here, and that is related to AGI and AI technologies, because they are universal.

Once you publish them, malicious actors can use them for harmful purposes.

Of course, once you open source something, you have no real way to take it back. Unlike things like API access, if you find something downstream that no one has considered before For harmful use cases, you can simply cut off access.

I think this means the bar for security, robustness and accountability is even higher. As we get closer to AGIs, they will have more powerful capabilities, so we have to be more careful about what they might be used for by malicious actors.

I have yet to hear a good argument from those who support open source, such as the open source extremists, many of whom are colleagues I respect in academia who How do you answer this question, which is consistent with protecting against open source models that would allow more malicious actors to access the model?

We need to think more about these issues as these systems become more powerful.

Q: So, why didn’t Gemma worry you about this issue?

DeepMind CEO:LLM+树搜索就是AGI技术线路,AI科研依赖工程能力,闭源模型就是比开源安全

Yes, of course, as you will notice, Gemma only offers Lightweight versions, so they are relatively small.

Actually, the smaller size is more useful for developers because usually individual developers, academics or small teams want to work quickly on their laptops, so they are made for that Optimized.

Because they are not cutting edge models, they are small models and we feel reassured because the capabilities of these models have been rigorously tested and we know very well what they are capable of for a model of this size There are no big risks.

Why DeepMind merged with Google Brain

Q: Last year, when Google Brain and DeepMind merged, some people I know in the AI ​​industry felt Worry. They worry that Google has historically given DeepMind considerable latitude to work on various research projects it deems important.

With the merger, DeepMind may have to be redirected to things that are beneficial to Google in the short term, rather than these larger Long-term basic research projects. It's been a year since the merger, has this tension between short-term interest in Google and possible long-term AI advancements changed what you can work on?

Yes, everything was very good this first year as you mentioned. One reason is that we think now is the right time, and I think it's the right time from a researcher's perspective.

Maybe let's go back five or six years, when we were doing things like AlphaGo, in the field of AI, we had been exploratory research on how to reach AGI, what breakthroughs were needed, what should be bets on, And in that case, you want to do a broad set of things, so I think that's a very exploratory stage.

I think over the last two or three years it has become clear what the main components of AGI will be, as I mentioned before, although we still need new innovations.

DeepMind CEO:LLM+树搜索就是AGI技术线路,AI科研依赖工程能力,闭源模型就是比开源安全

I think you just saw the long context of Gemini1.5 and I think there are a lot of new innovations like that that are going to be required, so the basics Research remains as important as ever.

But now we also need to work in the engineering direction, which is to expand and utilize known technologies and push them to their limits. It requires very creative engineering at scale, from prototypes to level of hardware to data center scale, and the efficiency issues involved.

Another reason is that if you were manufacturing some AI-driven products five or six years ago, you would have had to build an AI that was completely different from the AGI research track.

It can only perform tasks in special scenarios for specific products. It is a kind of customized AI, "hand-made AI".

But the situation is different today. To do AI for products, the best way now is to use general AI technologies and systems because they have reached sufficient levels of complexity and capability.

So actually this is a convergence point, so you can now see that the research track and the product track have been merged together.

For example, we are now going to make an AI voice assistant. The opposite is a chatbot that truly understands language. They are now integrated, so there is no need to consider that dichotomy now. Or coordinate a tense relationship.

The second reason is that having a tight feedback loop between research and real-world application is actually very beneficial to research.

Because of the way products allow you to really understand how your model performs, you can have academic metrics, but the real test is when millions of users use your product, they Do you find it useful, do you find it helpful, do you find it beneficial to the world.

You're obviously going to get a lot of feedback, and that will then lead to very rapid improvements to the underlying model, so I think we're in this very, very exciting stage right now.

The above is the detailed content of DeepMind CEO: LLM+tree search is the AGI technology line. AI research relies on engineering capabilities. Closed-source models are safer than open-source models.. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete