Home > Article > Technology peripherals > Shock the scientific community! Microsoft's 154-page research floods the screen: GPT-4's capabilities are close to humans, and "Skynet" is emerging?
Will GPT-4 evolve into general artificial intelligence?
Meta chief artificial intelligence scientist and Turing Award winner Yann LeCun expressed doubts about this.
In his opinion, large models require too much data and computing power, but the learning efficiency is not high. Therefore, only by learning the "world model" can we pass The road to AGI.
#However, the 154-page paper recently published by Microsoft seems to be a slap in the face.
In this paper called "Sparks of Artificial General Intelligence: Early experiments with GPT-4", Microsoft believes that although it is not complete,But GPT-4 can already be regarded as an early version of general artificial intelligence.
##Paper address: https://arxiv.org/pdf/2303.12712.pdf
Given the breadth and depth of GPT-4’s capabilities, we believe it should reasonably be considered an early (but still incomplete) version of an artificial general intelligence (AGI) system.
The main goal of this article is to explore the capabilities and limitations of GPT-4. We believe that the intelligence of GPT-4 marks the beginning of computer science and A true paradigm shift in other fields.
AGI’s agents are now able to think and reason like humans, and are also able to cover a wide range of cognitive skills and abilities.
In the paper, it is pointed out that AGI has reasoning, planning, problem solving, abstract thinking, understanding of complex ideas, rapid learning and experience learning capabilities .
In terms of parameter scale, Semafor reported that GPT-4 has 1 trillion parameters, which is 6 times that of GPT-3 (175 billion parameters) big.
Netizens used the GPT parameter scale brain neurons to make an analogy:
GPT- 3 is similar in scale to a hedgehog brain (175 billion parameters). If GPT-4 had 1 trillion parameters, we would be approaching the size of a squirrel brain. If we continue to develop at this rate, it may only take a few years before we can reach and surpass the size of the human brain (170 trillion parameters).
From this point of view, GPT-4 is not far away from becoming "Skynet".
#And this paper also revealed a lot of interesting things.
Not long after the paper was released, a netizen revealed on Twitter that hidden information had been discovered in their latex source code.
In the unabridged version of the paper, GPT-4 is actually the paper The hidden third author , with the internal name DV-3, was later deleted.
#Interestingly, even Microsoft researchers are not clear about the technical details of GPT-4. In addition, this paper also removes toxic content generated by GPT-4 without any prompts.
GPT-4 is beginning to take shape as AGIThe research object of this paper is an early version of GPT-4. When it was still in its early development stages, Microsoft researchers conducted various experiments and evaluations on it.
In the opinion of researchers, this early version of GPT-4 is already a representative of the new generation of LLM, and it shows more general intelligence than previous artificial intelligence models. .
#Through testing, Microsoft researchers confirmed that GPT-4 is not only proficient in language, but can also be used in mathematics, programming, vision, medicine, law, psychology, etc. Excellent performance in diverse and difficult tasks without requiring special prompts.
Surprisingly, in all these tasks, GPT-4's performance is close to human level, and often exceeds previous models, such as ChatGPT.
Therefore, the researchers believe that GPT-4 can be considered an early version of artificial general intelligence (AGI) given its breadth and depth of capabilities. .
#So, what are the challenges on its way towards deeper and more comprehensive AGI? Researchers believe that it may be necessary to seek a new paradigm that goes beyond "predicting the next word."
The following evaluation of GPT-4’s capabilities is the argument given by Microsoft researchers that GPT-4 is an early version of AGI.
Since the release of GPT-4, everyone’s impression of its multimodal capabilities still remains On the video of Greg Brockman's demonstration at the time.
#In the second section of this paper, Microsoft first introduced its multi-modal capabilities.
GPT-4 not only demonstrates high proficiency in diverse fields such as literature, medicine, law, mathematics, physical sciences, and programming; Unify skills and concepts from multiple areas and understand complex concepts.
Comprehensive ability
The researchers used the following 4 An example is provided to demonstrate the performance of GPT-4 in terms of comprehensive capabilities.
In the first example, to test GPT-4’s ability to combine art and programming, the researchers asked GPT-4 to generate javascript code to generate a painter Random images in Kandinsky style.
The following is the GPT-4 implementation code process:
## In terms of combining literature and mathematics, GPT-4 can prove that there are infinite prime numbers in Shakespeare's literary style.
Additionally, the study tested GPT-4’s ability to combine historical and physical knowledge by asking it to write a letter supporting Electron’s bid for U.S. President. Letter, written by Mahatma Gandhi to his wife.
Generate python code by prompting GPT-4 for a program that takes as input a vector of patient age, gender, weight, height, and blood test results, and indicate whether the patient is at increased risk for diabetes.
Through testing, the above examples show that GPT-4 is not only able to learn some common principles and patterns across different fields and styles, but also combine them in creative ways .
Visual
##When prompted GPT-4 can be used When Scalable Vector Graphics (SVG) generates images of objects, such as cats, trucks, or letters, the code generated by the model usually compiles into a fairly detailed and recognizable image, such as the following:
However, many people may think that GPT-4 simply copied the code from the training data, which contains similar images.
In fact, GPT-4 not only copied code from similar examples in the training data, but was able to handle real vision tasks despite only being trained on text.
As follows, the model is prompted to draw a person by combining the shapes of the letters Y, O, and H.
#During the generation process, the researchers used the draw-line and draw-circle commands to create the letters O, H, and Y, and then GPT-4 managed to They are placed within what appears to be a reasonably humanoid image.
Although GPT-4 has not been trained to recognize letter shapes, it can still be inferred that the letter Y may look like a torso with arms pointing upward. .
In the second demonstration, GPT-4 was prompted to correct the proportions of the torso and arms and center the head. Finally ask the model to add a shirt and pants.
It seems that GPT-4 vaguely learns that letters are related to some specific shapes from relevant training data, and the results are still good.
To further test GPT-4’s ability to generate and manipulate images, we tested how well it followed detailed instructions to create and edit graphics. This task requires not only generative but also interpretive, combinatorial and spatial abilities.
The first command is to let GPT-4 generate a 2D image. The prompt is:
##『A frog hops into a bank and asks the teller, 'Do you have any free lily pads?' The teller responds, 'No, but we do o er low interest loans for pond upgrades』
Through multiple attempts, GPT-4 generated an image that matched the description every time. Then, GPT-4 was asked to add more details to improve the graphics quality. GPT-4 added realistic objects such as banks, windows, and cars.
#Our second example is trying to use Javascript to generate a 3D model, also completing many tasks through the instructions GPT-4.
In addition, GPT-4 can combine the capabilities of Stable Difusion in sketch generation.
The picture below is a screenshot of 3D city modeling. The input prompt has a river flowing from left to right, a desert with pyramids next to the river, and a desert at the bottom of the screen. 4 buttons, colors are green, blue, brown and red. The generated results are as follows:
##Music
The researchers asked GPT-4 to use ABC notation encoding to generate and modify tunes, as follows:
By exploring the performance of GPT-4 in training Given how much skill they acquired, the researchers found that GPT-4 was able to generate efficient melodies in ABC notation and to interpret and manipulate the structure within them to a certain extent.
#However, researchers were unable to make GPT-4 produce any non-trivial harmonic forms, such as "Ode to Joy" and "Für Elise". ” and other famous melodies. Programming ability
Additionally, the researchers demonstrated that GPT-4 is capable of coding at a very high level, regardless of the instructions. Demonstrated ability to both write code and understand existing code. In terms of writing code according to instructions, the researchers demonstrated an example of letting GPT-4 write python functions. After the code is generated, the researchers use the software engineering interview platform LeetCode to judge online whether the code is correct. Everyone is discussing that LeetCode has only a 20% accuracy rate. Yi Zhang, the author of the paper, refuted this . In addition, let GPT-4 visualize the accuracy data of LeetCode in the above table as a chart, the results are as shown in the figure shown. GPT-4 can not only complete ordinary programming work, but also be competent in complex 3D game development. The researchers asked GPT-4 to write a 3D game in HTML using JavaScript. GPT-4 generated a game that met all requirements with zero samples. . In deep learning programming, GPT-4 not only requires knowledge of mathematics and statistics, but also requires knowledge of PyTorch and TensorFlow Familiar with frameworks and libraries such as Keras and Keras. The researchers asked GPT-4 and ChatGPT to write a custom optimizer module and provided it with a natural language description that included a series of important operations , such as applying SVD and so on. In addition to writing code according to instructions, GPT-4 has demonstrated a strong ability to understand the code. The researchers tried to let GPT-4 and ChatGPT understand a C/C program and predict the output of the program. The performance of the two is as follows: The areas highlighted in yellow are insightful insights from GPT-4, while the red markers represent areas where ChatGPT went wrong. Through the coding ability test, researchers found that GPT-4 can handle a variety of coding tasks, from coding challenges to practical Applications, from low-level assembly to high-level frameworks, from simple data structures to complex programs. In addition, GPT-4 can reason about code execution, simulate the effects of instructions, and interpret the results in natural language. GPT-4 can even execute pseudocode. In terms of mathematical ability, compared to previous large language models, GPT-4 has made a qualitative leap. Even when faced with the specially fine-tuned Minerva, the performance has been significantly improved. #However, it is still far from the expert level. For example: The population of rabbits will increase by a times every year, and on the last day of the year, there are b A rabbit is adopted by humans. Suppose there are x rabbits on the first day of the first year. It is known that the number of rabbits will become 27x-26 after 3 years. So, what are the values of a and b? #In order to solve this problem, we first need to derive the correct expression for the annual change in the number of rabbits, and then derive a system of equations through this recursive relationship, and then get Answer. Here, GPT-4 successfully arrived at a solution and presented a reasonable argument. In contrast, ChatGPT was never able to give correct reasoning and answers in several independent attempts. Advanced Mathematics Next, let’s go straight to the difficult one. For example, the following question is from the 2022 International Mathematical Olympiad (IMO) (simplified version). This question differs from the undergraduate calculus exam in that it does not conform to a structured template. Solving this problem requires a more creative approach, as there is no clear strategy to begin the proof. For example, the decision to divide the argument into two cases (g(x) > x^2 and g(x) #Despite this, GPT-4 still gave a correct proof. The second discussion on algorithms and graph theory is comparable to a graduate-level interview. In this regard, GPT-4 is able to reason about an abstract graph construction related to the constraint satisfaction problem and draw correct conclusions about the SAT problem from it ( To the best of our knowledge, this construction does not appear in the mathematical literature). This conversation reflected GPT-4’s deep understanding of the undergraduate-level mathematical concepts discussed, as well as a considerable degree of creativity. Although GPT-4 wrote 2^n/2 as 2^n-1 in one answer, it seems more like what we commonly call " clerical error" because it later provided a correct generalization of the formula. In addition, the researchers compared GPT-4, ChatGPT and Minerva performance: GSM8K and MATH. It was found that GPT4 outperformed Minerva on each data set, and the accuracy in both test sets exceeded 80%. Let’s take a closer look at the reasons why GPT4 makes mistakes. 68% of them are calculation errors, not solution errors. Another key manifestation of intelligence is interactivity. Interactivity is important to intelligence because it enables an agent to acquire and apply knowledge, solve problems, adapt to changing situations, and achieve beyond itself Ability goals. Therefore, researchers studied the interactivity of GPT-4 from two dimensions: tool use and specific interaction. GPT-4 can search engines or external tools such as APIs when answering questions like the following. In the paper, researchers found that GPT-4 can establish human Mental models. The study designed a series of tests to assess the theory of mind abilities of GPT-4, ChatGPT and text-davinci-003. For example, in understanding beliefs, GPT-4 successfully passed the Sally-Anne false belief test in psychology. There is also a test of GPT-4’s ability to infer the emotional state of others in complex situations: -Why is Tom making a sad face? -What does Adam think causes Tom's sad expression? Through multiple rounds of tests, researchers found that it is necessary to reason about the psychological state of others and propose solutions that are consistent with real-life social scenarios. scheme, GPT-4 outperforms ChatGPT and text-davinci-003. The "predict next word" model used by GPT-4 has obvious limitations: the model lacks planning , working memory, retrospective ability and reasoning ability. Because the model relies on a local greedy process that generates the next word, without developing a deep understanding of the global task or output. Therefore, GPT-4 is good at generating smooth and coherent text, but not good at solving complex or creative problems that cannot be processed in a sequential manner. #For example, use four random numbers in the range 0 to 9 to perform multiplication and addition operations. On this problem that even elementary school students can solve, GPT-4's accuracy is only 58%. When the numbers were between 10 and 19, and between 20 and 39, the accuracy dropped to 16% and 12% respectively. When the number is in the range of 99 to 199, the accuracy drops directly to 0. #However, if GPT-4 is allowed to "take its time" to answer questions, accuracy can easily be improved. For example, ask the model to write out the intermediate steps using the following prompts: 116 * 114 178 * 157 = ? Let’s take it one step at a time Think and write down all the intermediate steps before arriving at the final solution. At this time, when the number is in the range of 1-40, the accuracy rate is as high as 100%, and when it is in the range of 1-200, it also reaches 90%. What’s interesting is that just after Microsoft’s paper was published Soon after, Marcus immediately wrote a blog, calling Microsoft's view "absolutely ridiculous." #And quoted a sentence from the Bible "Pride goes before destruction, and a haughty spirit before a fall. (Proverbs 16:18)" How can GPT-4 be considered an early AGI? In this way, the calculator also counts, and Eliza and Siri count even more. This definition is very vague and it is easy to take advantage of it. In Marcus’ view, GPT-4 has nothing to do with AGI, and GPT-4 is the same as before, its shortcomings are still unresolved, and the illusion still exists. The unreliability of the answers has not been resolved, and even the author himself admits that his ability to plan complex tasks is still insufficient. What he is worried about is the two papers written by OpenAI and Microsoft. The models written are not disclosed at all. There is no training set and architecture. Just rely on a Paper press releases just want to promote their scientific nature. So the "some form of AGI" claimed in the paper does not exist, and the scientific community cannot verify it at all because it cannot obtain training. data, and it appears that the training data has been contaminated. To make matters worse, OpenAI has begun incorporating user experiments into the training corpus itself. By confusing the situation in this way, the scientific community cannot judge a key capability of GPT-4: whether the model has the ability to generalize new test cases. If OpenAI didn’t put a scientific hat on itself here, Marcus might not be so critical of it. # He admitted that GPT-4 is very powerful, but the risks are also well known. If OpenAI lacks transparency and refuses to make its models public, it might as well be shut down. Microsoft has a strong lineup of authors behind this 154-page paper. They include: Sébastien Bubeck, principal researcher at Microsoft Research Redmond and winner of the 2015 Sloan Prize, Ronen Eldan, winner of the 2023 New Horizons Mathematics Prize, Yin Tat Lee, winner of the 2020 Sloan Research Award, and Li Yuanzhi, the new winner of the 2023 Sloan Research Award. It is worth mentioning that the Microsoft team’s original paper title was not “The Spark of General Artificial Intelligence: GPT” -4’s early experiments”. The leaked latex code in the unabridged paper shows that the original title was "First Contact with AGI." Mathematical ability
Interacting with the world
Interacting with humans
Limitations
Marcus issued a rebuttal
Strong author lineup
The above is the detailed content of Shock the scientific community! Microsoft's 154-page research floods the screen: GPT-4's capabilities are close to humans, and "Skynet" is emerging?. For more information, please follow other related articles on the PHP Chinese website!