With reference to the human brain, will learning to forget make large AI models better?-AI-php.cn

Home

Technology peripherals

With reference to the human brain, will learning to forget make large AI models better?

王林

Mar 12, 2024 pm 02:43 PM

AIai large model

With reference to the human brain, will learning to forget make large AI models better?

Recently, a team of computer scientists developed a more flexible and resilient machine learning model with the ability to periodically forget known information. Features that existing large language models do not have.

Actual testing shows that in many cases, the "forgetting method" is very efficient in training, and the forgetting model will perform better. Jea Kwon, a AI engineer at the Institute for Basic Science in Korea, said the new research represents significant progress in the AI field.

The "forgetting method" training efficiency is very high

Most of the current mainstream AI language engines use artificial neural network technology. Each "neuron" in this network structure is actually a mathematical function. They are connected to each other, receive and transmit information, and realize data processing and learning through complex operations of multiple layers of neurons. This simulation method of neural networks enables AI to simulate the working way of the human brain, thereby achieving human-like intelligent behavior.

In the beginning, the information flow is more or less random. As the network continues to match the training data, the information flowing between neurons will continue to optimize. For example, if a researcher wants to train a bilingual translation model, it first collects massive amounts of bilingual text and uses the text to train the model. It adjusts the connections between neurons to compare the text in one language with the equivalent text in another language. Connect effective words.

The above training requires a lot of computing resources. If the model performs poorly, or user needs change, the model may not be able to meet the needs.

Researcher Mikel Artetxe pointed out: "Suppose you have a model that contains 100 languages, but one language is not included. If you want to add this language to the model, you must retrain. ”

A few years ago, Artetxe and his colleagues trained a neural network on a language, and they erased the word composition information known to the neural network, which is called “Tokens”. Tokens are stored in the first layer of the neural network, which is also called the "embedding layer". For other layers, ignore them. After erasing the Tokens of the first language and training in the second language, new Tokens of the second language can be filled into the embedding layer.

Although the model contains a large amount of mismatched information, it can still be retrained in the second language, which means that the model can learn and process the second language. The researchers believe that although the embedding layer stores vocabulary-specific information of the second language, the neural network stores abstract information at the lower level, which involves the behind-the-scenes concepts of human language. It is these concepts that help the model learn the second language.

Chen Yihong, author of the research report, believes: "We live in the same world and use words in different languages to express the same concepts. Therefore, there will be the same level of reasoning in the model, such as an apple, which is sweet It's delicious, it represents more than just one word."

Adding new languages to the trained model, using the "forgetting method" is very efficient. However, it still needs to be retrained, and it still requires massive amounts of data. data and powerful processing power. Is there a better way? Of course, there is no need to train, just erase the embedding layer and then train again, that is, periodically reset the embedding layer during the initial training.

Artetxe said: "In this way, the entire model can adapt to the reset. If you want to extend the model and adapt it to another language, the process will become easier."

Forgetting models perform better

The researchers experimented with Roberta, a relatively general large language model, trained using periodic forgetting techniques, and compared it with models trained using standard, non-forgetting methods. The results showed that when processing the first language, the forgetting model scored 85.1 points and the traditional standard model scored 86.1 points. When training in the second language, using only about 5 million Tokens (70 billion were used in the first language), the accuracy score of the forgetting model dropped to 62.7 points, and the standard model dropped to 53.3 points.

If researchers impose computational constraints when retraining, forgetful models will perform better. For example, when the researchers shortened the training length from 125,000 steps to 5,000 steps, the average score of the unlearning model was about 57.8 points, and the standard model dropped to 37.2 points, almost guessing.

So the researchers concluded that the forgetting model performed better when learning language.

Evgenii Nikishin, a researcher at Quebec's deep learning research center Mila, said: "Because the model is constantly unlearning and then relearning during training, it will be easier to teach the network something new later." There are indications that the model will look at a deeper level when understanding language, not just understanding the meaning of individual words.

The forgetting method is somewhat similar to the operating mode of the human brain. Benjamin Levy, a neuroscientist at the University of San Francisco, believes: "Human memory is quite imprecise when storing large amounts of detailed information. But the human brain can remember the key points of experience, remember abstract information, and is good at inferring. Let AI process information like humans, such as letting It has the ability to forget, and AI may be more flexible."

Yihong Chen believes that factories manufacturing language models may appear in the future. Such factories require forgetting technology. It is a basic model that can be quickly adapted. New Field. (Knife)

The above is the detailed content of With reference to the human brain, will learning to forget make large AI models better?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

You Must Build Workplace AI Behind A Veil Of IgnoranceApr 29, 2025 am 11:15 AM

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Decisions, Decisions… Next Steps For Practical Applied AIApr 29, 2025 am 11:14 AM

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The Agents Are Coming – More On What We Will Do Next To AI PartnersApr 29, 2025 am 11:13 AM

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Why Empathy Is More Important Than Control For Leaders In An AI-Driven FutureApr 29, 2025 am 11:12 AM

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

AI For Product Classification: Can Machines Master Tax Law?Apr 29, 2025 am 11:11 AM

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

Could Data Center Demand Spark A Climate Tech Rebound?Apr 29, 2025 am 11:10 AM

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

AI And Hollywood's Next Golden AgeApr 29, 2025 am 11:09 AM

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Apr 29, 2025 am 11:08 AM

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

The most popular open source editor

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),