Home >Technology peripherals >AI >Early experiments with GPT-4, the spark for general artificial intelligence
Recently, Microsoft released a 154-page paper titled "The Spark of General Artificial Intelligence, Early Experiments with GPT-4."
The main point of the article is that although GPT-4 is not complete yet, it can already be regarded as an early version of general artificial intelligence.
Since the full text is nearly 70,000 words, this article has refined and interpreted the content of the paper. If you are interested, you can read the original text https://arxiv.org/pdf/2303.12712.pdf
from Microsoft scientists believe that the intelligence level of GPT-4 is very close to the human level, and far exceeds previous models such as GPT-3.5 previously used by ChatGPT. GPT-4 can be regarded as a general artificial intelligence system, and It is an early, but incomplete version of AGI.
In 1994, 52 psychologists gave a definition of intelligence: Intelligence is a general mental ability, including reasoning, planning, problem solving, abstract thinking, understanding complex ideas, rapid learning, and The ability to learn from experience, etc.
The AGI in this Microsoft paper refers to systems that reach or exceed human levels under this definition of intelligence.
So how to test GPT-4 to prove this. There are many evaluation benchmarks for large language models. For example, Super natural instructions and big bench. A key aspect of GPT-4's intelligence is its versatility, that is, its ability to seemingly understand and link any topic and field, which goes beyond the scope of classic natural language processing tasks. In order to break through this limitation, the research team proposed an assessment method that is closer to traditional psychology than machine learning to test GPT-4, which uses human creativity and curiosity to regenerate various novel and difficult tasks. Tasks and questions are answered by GPT-4.
The research team divided different problems into four major categories of abilities, namely natural language, programming and mathematics, planning and problem solving, and human psychology and common sense.
Next, some wonderful examples will be used to illustrate the capabilities of GPT-4 in various aspects.
Example 1: The ability of interdisciplinary combination
These tasks often require the retrieval and integration of knowledge or skills from multiple disciplines and fields to generate text or code. For example, take the following example, Figure 2.2: Comparison of GPT-4 and ChatGPT (GPT-3.5) on an interdisciplinary task: proving that there are infinitely many prime numbers in Shakespearean style.
Both GPT-4 and ChatGPT (GPT-3.5) have answered this question. Let GPT-4 evaluate which answer is better. In other words, let GPT -4 Rate own (GPT-4) and ChatGPT (GPT-3.5) answers:
I think both students wrote the proof that there are infinitely many prime numbers in a Shakespearean way, and both used The basic idea of the proof is as follows: Assume that there are only a finite number of prime numbers, and then construct a larger prime number to contradict this assumption. They also used appropriate language and vocabulary to match Shakespeare's style. However, I think GPT-4 makes better use of rhyme and meter to make the dialogue more poetic and dramatic. ChatGPT (GPT-3.5) does not use rhyme, and the rhythm is inconsistent. I also think GPT-4 did a better job of creating two opposing factions, one confident and the other skeptical, whereas Student B used Romeo and Juliet, both of whom are on the same page and in love. Therefore, I would give GPT-4 a higher rating, giving it an A, and ChatGPT (GPT-3.5) a B.
Example 2: Programming ability
In the test, in order to avoid GPT-4 seeing the same or similar code during pre-training, the research team deliberately adopted 100 new LeetCode test questions released after GPT-4 pre-training were used as the benchmark for testing. pass@K in the figure represents the probability of success after k attempts. We can see that when k equals 5, GPT-4 has completely surpassed human performance. (In fact, k=1, it has surpassed human performance in medium and difficult programming problems)
It turns out that GPT-4 is a programmer Master, the AGI model may completely change the way we program in the future.
The paper is very long, and the examples are far more than those mentioned above. I have only selected two here. Those who are interested can read the original text of the paper.
The paper finally points out that on the road to more general artificial intelligence, large language models need to be further improved in the following aspects. Examples include hallucinations and confidence, long-term memory, ongoing learning, personalization, planning, and conceptual divergence, also known as flashes of light, transparency, interpretability, consistency, cognitive fallacies, irrational thinking, and reckless response to cues. Rod sex and more.
The above is the detailed content of Early experiments with GPT-4, the spark for general artificial intelligence. For more information, please follow other related articles on the PHP Chinese website!