


Super programmed AI appears on the cover of Science! AlphaCode Programming Contest: Half of the Programmers Are Contested
This December, when OpenAI’s ChatGPT is gaining momentum, AlphaCode, which once overwhelmed half of the programmers, is on the cover of Science!
Paper link: https://www.science.org/doi/10.1126/science.abq1158
Speaking of AlphaCode, everyone must be familiar with it.
As early as February this year, it quietly participated in 10 programming competitions on the famous Codeforces and defeated half of the human coders in one fell swoop.
Half of the programmers will be beaten
We all know that this kind of test is very popular among programmers-programming competition.
In the competition, the main test is the programmer's ability to think critically through experience and create solutions to unforeseen problems.
This embodies the key to human intelligence, and machine learning models are often difficult to imitate this kind of human intelligence.
But the scientists at DeepMind broke this rule.
YujiA Li et al. developed AlphaCode using self-supervised learning and an encoder-decoder converter architecture.
##The development work of AlphaCode was completed while at home
Although AlphaCode is also based on the standard Transformer codec architecture, DeepMind has enhanced it at an "epic level" -
It uses a Transformer-based language models, generate code at an unprecedented scale, and then cleverly filter out a small subset of available programs.
The specific steps are:
1) Multi-ask attention: let each attention block share the key and value header, and At the same time, combined with the encoder-decoder model, the sampling speed of AlphaCode is increased by more than 10 times.
2) Masked Language Modeling (MLM): By adding an MLM loss to the encoder, the solution rate of the model is improved.
3) Tempering: Make the training distribution sharper, thereby preventing the regularization effect of overfitting.
4) Value conditioning and prediction: Provide an additional training signal by distinguishing correct and incorrect question submissions in the CodeContests dataset.
5) Exemplary Out-of-Strategy Learning Generation (GOLD): Let the model produce the correct solution for each problem by focusing training on the most likely solution to each problem .
Well, everyone knows the result.
With an Elo score of 1238, AlphaCode has ranked in the top 54.3% in these 10 games. Looking at the previous 6 months, this result reached the top 28%.
You must know that in order to achieve this ranking, AlphaCode must "pass five levels and defeat six generals", solving problems that integrate critical thinking, logic, and algorithms , various new problems combining coding and natural language understanding.
Judging from the results, AlphaCode not only solved 29.6% of the programming problems in the CodeContests data set, but 66% of them were solved in the first submission. (The total number of submissions is limited to 10 times)
In comparison, the solution rate of the traditional Transformer model is relatively low, only single digits.
Even Codeforces founder Mirzayanov was very surprised by this result.
After all, programming competitions test the ability to invent algorithms, which has always been the weakness of AI and the strength of humans.
I can definitely say that the results from AlphaCode exceeded my expectations. I was skeptical at first because even in simple competition problems one not only needs to implement the algorithm but also invent it (which is the hardest part). AlphaCode has made itself a formidable opponent for many humans. I can't wait to see what the future holds!
——Mike Mirzayanov, founder of Codeforces
So, can AlphaCode steal programmers’ jobs?
Of course not.
AlphaCode can only complete simple programming tasks. If the tasks become more complex and the problems become more "unforeseeable", AlphaCode, which only translates instructions into codes, will be helpless.
After all, from a certain perspective, a score of 1238 is equivalent to the level of a middle school student who is just learning to program. At this level, it is not a threat to real programming experts.
But there is no doubt that the development of this type of coding platform will have a huge impact on programmer productivity.
Even the entire programming culture may be changed: perhaps, in the future, humans will only be responsible for formulating problems, and the tasks of generating and executing code can be handed over to machine learning.
What’s so difficult about programming competitions?
We know that although machine learning has made great progress in generating and understanding text, most AI is still limited to simple mathematics and programming problems.
What they will do is to retrieve and copy existing solutions (I believe anyone who has played ChatGPT recently will understand this).
So, why is it so difficult for AI to learn to generate the correct program?
1. To generate code that solves a specified task, you need to search in all possible character sequences. This is a massive space, and only a small part of it corresponds to the valid correct program.
2. A single character edit may completely change the behavior of the program or even cause it to crash, and each task has many distinct and valid solutions.
For extremely difficult programming competitions, AI needs to understand complex natural language descriptions; it needs to reason about problems it has never seen before, rather than simply memorizing code snippets; Requires mastery of various algorithms and data structures, and precise completion of code that may be hundreds of lines long.
In addition, to evaluate the code it generates, the AI also needs to perform tasks on an exhaustive set of hidden tests and check for execution speed and edge-case correctness.
(A) Problem 1553D, with a medium difficulty score of 1500; (B) Problem solution generated by AlphaCode
Take this 1553D problem as an example. Participants need to find a way to convert a string of randomly repeated s and t letters into another string of the same letters using a limited set of inputs. .
Contestants cannot just enter new letters, but must use the "backspace" command to delete several letters from the original string. The specific questions are as follows:
In this regard, the solution given by AlphaCode is as follows:
Moreover, AlphaCode’s “problem-solving ideas” are no longer It is a black box that also shows the location of the code and attention highlights.
AlphaCode’s Learning System
When participating in programming competitions, the main challenges AlphaCode faces are:
(i) requires searching in a huge program space, (ii) only about 13,000 example tasks are available for training, and (iii) there is a limited number of submissions per problem.
In order to deal with these problems, the construction of AlphaCode's entire learning system is divided into three links, pre-training, fine-tuning, sampling and evaluation, as shown in the figure above Show.
Pre-training
In the pre-training stage, using 715GB of code snapshots of human coders collected on GitHub, Pre-train the model and use cross-entropy next-token prediction loss. During the pre-training process, the code file is randomly divided into two parts, the first part is used as the input of the encoder, and the model is trained to generate the second part without the encoder.
This pre-training learns a strong prior for the encoding, enabling subsequent task-specific fine-tuning to be performed on a smaller data set.
Fine-tuning
In the fine-tuning phase, the model is run on a 2.6GB competitive programming problem dataset For fine-tuning and evaluation, the dataset was created by DeepMind and released publicly under the name CodeContests.
The CodeContests data set includes questions and test cases. The training set contains 13,328 questions, with an average of 922.4 submitted answers per question. The validation set and test set contain 117 and 165 questions respectively.
During fine-tuning, the natural language problem statement is encoded as program annotations so that it looks more similar to the files seen during pre-training (which can include extended natural language language annotation) and use the same next-token prediction loss.
Sampling
In order to select the 10 best samples for submission, filtering and clustering methods are used , execute the samples using the example tests included in the problem statement, and remove samples that fail these tests.
Filter out nearly 99% of the model samples, cluster the remaining candidate samples, execute these samples on the input generated by a separate transformer model, and Programs that produce the same output on generated input are grouped together.
Then, select one sample from each of the 10 largest clusters for submission. Intuitively, correct programs behave identically and form large clusters, while incorrect programs fail in diverse ways.
Evaluation
The above figure shows the 10@k indicator on how model performance changes with more sample size and computational effort. From the performance evaluation of the sampling results, the researchers came to the following four conclusions:
#1. The solution rate expands logarithmically linearly with larger sample sizes;
2. A better model has a higher slope on the scaling curve;
3. The solution rate is logarithmically linearly proportional to more calculations;
4. Sample selection is crucial to the expansion of the solution rate.
Purely “data-driven”
There is no doubt that the introduction of AlphaCode represents a substantial step in the development of machine learning models.
Interestingly, AlphaCode does not contain explicit built-in knowledge about the structure of computer code.
Instead, it relies on a purely "data-driven" approach to writing code, which is to learn the structure of computer programs by simply observing large amounts of existing code.
Article address: https://www.science.org/doi/10.1126/science.add8258
Fundamentally, what makes AlphaCode better than other systems on competitive programming tasks comes down to two main attributes:
1. Training Data
2. Post-processing of candidate solutions
But computer code is a highly structured medium and programs must adhere to a defined syntax , and must produce explicit pre- and post-conditions in different parts of the solution.
The method used by AlphaCode when generating code is exactly the same as when generating other text content - one token at a time, and the correctness of the program is only checked after the entire program is written. .
Given the appropriate data and model complexity, AlphaCode can generate coherent structures. However, the final recipe for this sequential generation procedure is buried deep within the parameters of LLM and is elusive.
However, regardless of whether AlphaCode can really "understand" programming problems, it does reach the average human level in coding competitions.
"Solving programming competition problems is a very difficult thing and requires humans to have good coding skills and problem-solving creativity. AlphaCode is able to achieve success in this field I'm impressed with the progress and excited to see how the model uses its statement understanding to generate code and guide its stochastic exploration to create solutions."
—Petr Mitrichev, Google software engineer and world-class competitive programmer
AlphaCode placed in the top 54% of programming competitions, demonstrating how deep learning models require critical thinking potential in the task of thinking.
These models elegantly leverage modern machine learning to express solutions to problems as code, harkening back to AI’s symbolic reasoning roots from decades ago.
And this is just the beginning.
In the future, more powerful AIs that can solve problems will be born. Perhaps this day is not far away.
The above is the detailed content of Super programmed AI appears on the cover of Science! AlphaCode Programming Contest: Half of the Programmers Are Contested. For more information, please follow other related articles on the PHP Chinese website!

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

Zend Studio 13.0.1
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.