


With reference to the human brain, will learning to forget make large AI models better?
Recently, a team of computer scientists developed a more flexible and resilient machine learning model with the ability to periodically forget known information. Features that existing large language models do not have.
Actual testing shows that in many cases, the "forgetting method" is very efficient in training, and the forgetting model will perform better. Jea Kwon, a AI engineer at the Institute for Basic Science in Korea, said the new research represents significant progress in the AI field.
The "forgetting method" training efficiency is very high
Most of the current mainstream AI language engines use artificial neural network technology. Each "neuron" in this network structure is actually a mathematical function. They are connected to each other, receive and transmit information, and realize data processing and learning through complex operations of multiple layers of neurons. This simulation method of neural networks enables AI to simulate the working way of the human brain, thereby achieving human-like intelligent behavior.
In the beginning, the information flow is more or less random. As the network continues to match the training data, the information flowing between neurons will continue to optimize. For example, if a researcher wants to train a bilingual translation model, it first collects massive amounts of bilingual text and uses the text to train the model. It adjusts the connections between neurons to compare the text in one language with the equivalent text in another language. Connect effective words.
The above training requires a lot of computing resources. If the model performs poorly, or user needs change, the model may not be able to meet the needs.
Researcher Mikel Artetxe pointed out: "Suppose you have a model that contains 100 languages, but one language is not included. If you want to add this language to the model, you must retrain. ”
A few years ago, Artetxe and his colleagues trained a neural network on a language, and they erased the word composition information known to the neural network, which is called “Tokens”. Tokens are stored in the first layer of the neural network, which is also called the "embedding layer". For other layers, ignore them. After erasing the Tokens of the first language and training in the second language, new Tokens of the second language can be filled into the embedding layer.
Although the model contains a large amount of mismatched information, it can still be retrained in the second language, which means that the model can learn and process the second language. The researchers believe that although the embedding layer stores vocabulary-specific information of the second language, the neural network stores abstract information at the lower level, which involves the behind-the-scenes concepts of human language. It is these concepts that help the model learn the second language.
Chen Yihong, author of the research report, believes: "We live in the same world and use words in different languages to express the same concepts. Therefore, there will be the same level of reasoning in the model, such as an apple, which is sweet It's delicious, it represents more than just one word."
Adding new languages to the trained model, using the "forgetting method" is very efficient. However, it still needs to be retrained, and it still requires massive amounts of data. data and powerful processing power. Is there a better way? Of course, there is no need to train, just erase the embedding layer and then train again, that is, periodically reset the embedding layer during the initial training.
Artetxe said: "In this way, the entire model can adapt to the reset. If you want to extend the model and adapt it to another language, the process will become easier."
Forgetting models perform better
The researchers experimented with Roberta, a relatively general large language model, trained using periodic forgetting techniques, and compared it with models trained using standard, non-forgetting methods. The results showed that when processing the first language, the forgetting model scored 85.1 points and the traditional standard model scored 86.1 points. When training in the second language, using only about 5 million Tokens (70 billion were used in the first language), the accuracy score of the forgetting model dropped to 62.7 points, and the standard model dropped to 53.3 points.
If researchers impose computational constraints when retraining, forgetful models will perform better. For example, when the researchers shortened the training length from 125,000 steps to 5,000 steps, the average score of the unlearning model was about 57.8 points, and the standard model dropped to 37.2 points, almost guessing.
So the researchers concluded that the forgetting model performed better when learning language.
Evgenii Nikishin, a researcher at Quebec's deep learning research center Mila, said: "Because the model is constantly unlearning and then relearning during training, it will be easier to teach the network something new later." There are indications that the model will look at a deeper level when understanding language, not just understanding the meaning of individual words.
The forgetting method is somewhat similar to the operating mode of the human brain. Benjamin Levy, a neuroscientist at the University of San Francisco, believes: "Human memory is quite imprecise when storing large amounts of detailed information. But the human brain can remember the key points of experience, remember abstract information, and is good at inferring. Let AI process information like humans, such as letting It has the ability to forget, and AI may be more flexible."
Yihong Chen believes that factories manufacturing language models may appear in the future. Such factories require forgetting technology. It is a basic model that can be quickly adapted. New Field. (Knife)
The above is the detailed content of With reference to the human brain, will learning to forget make large AI models better?. For more information, please follow other related articles on the PHP Chinese website!

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version
Visual web development tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

Dreamweaver CS6
Visual web development tools
