DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.-AI-php.cn

DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Sep 13, 2023 pm 04:41 PM

large modeltheoryoptimizer

This article proposes a simple and effective method OPRO, which uses a large language model as an optimizer. The optimization task can be described in natural language, which is better than the prompts designed by humans.

Optimization is crucial in all fields.

#Some optimizations start with initialization and then iteratively update the solution to optimize the objective function. Such optimization algorithms often need to be customized for individual tasks to address the specific challenges posed by the decision space, especially for derivative-free optimization.

In the study we are going to introduce next, the researchers took a different approach. They used a large language model (LLM) to act as an optimizer and performed better than humans on various tasks. The design tips are okay.

This research comes from Google DeepMind, who proposed a simple and effective optimization method OPRO (Optimization by PROmpting), in which the optimization task can be described in natural language, For example, the prompt for LLM can be "Take a deep breath and solve this problem step by step", or it can be "Let's combine our numerical commands and clear thinking to decipher the answer quickly and accurately" and so on.

In each optimization step, LLM generates a new solution based on hints from previously generated solutions and their values, and then evaluates the new solution and add it to the tips for the next optimization step.

Finally, the study applies the OPRO method to linear regression and the traveling salesman problem (the famous NP problem), and then proceeds to prompt optimization, with the goal of finding the maximization task accurately Rate instructions.

This paper conducts a comprehensive evaluation of multiple LLMs, including text-bison and Palm 2-L in the PaLM-2 model family, and gpt- in the GPT model family. 3.5-turbo and gpt-4. The experiment optimized the prompts on GSM8K and Big-Bench Hard. The results show that the best prompt optimized by OPRO is 8% higher than the manually designed prompts on GSM8K and is higher than the manually designed prompts on the Big-Bench Hard task. Output up to 50%.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Paper address: https://arxiv.org/pdf/2309.03409.pdf

First paper, Google Chengrun Yang, a research scientist at DeepMind, said: “In order to perform prompt optimization, we start with basic instructions such as ‘Let’s start solving the problem’, or even empty strings. In the end, the instructions generated by OPRO will gradually improve LLM performance, as shown below The upward performance curve shown looks just like the situation in traditional optimization!"

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

"Every LLM is optimized by OPRO even if it starts from the same instruction. , the final optimization instructions of different LLMs also show different styles, are better than instructions written by humans, and can be transferred to similar tasks."

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

We can also conclude from the above table that the instruction styles finally found by LLM as an optimizer are very different. The instructions of PaLM 2-L-IT and text-bison are concise, while the instructions of GPT are long. And detailed. Although some top-level instructions contain "step-by-step" prompts, OPRO can find other semantic expressions and achieve comparable or better accuracy.

However, some researchers said that the prompt "take a deep breath and take it step by step" is very effective on Google's PaLM-2 (accuracy rate 80.2). But we can't guarantee that it works on all models and in all situations, so we shouldn't blindly use it everywhere.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

OPRO: LLM as optimizer

Figure 2 shows the overall framework of OPRO. At each optimization step, LLM generates candidate solutions to the optimization task based on the optimization problem description and previously evaluated solutions in the meta-prompt (bottom right part of Figure 2).

Next, LLM evaluates the new solutions and adds them to meta-tips for the subsequent optimization process.

The optimization process is terminated when LLM is unable to propose a new solution with a better optimization score or when the maximum number of optimization steps is reached.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Figure 3 shows an example. Meta-hints contain two core contents, the first part is the previously generated hints and their corresponding training accuracy; the second part is the optimization problem description, including several randomly selected examples from the training set to exemplify the task of interest.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

#This article first demonstrates the potential of LLM as a "mathematical optimization" optimizer. The results in the linear regression problem are shown in Table 2:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Next, the paper also explores the application of OPRO in the Traveling Salesman (TSP) ) problem, specifically, TSP means that given a set of n nodes and their coordinates, the TSP task is to find the shortest path starting from the starting node, traversing all nodes and finally returning to the starting node.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Experiment

In the experiment, this article uses the pre-trained PaLM 2-L, PaLM 2-L, text-bison, gpt-3.5-turbo, and gpt-4, which have been fine-tuned by instructions, are used as LLM optimizers; the pre-trained PaLM 2-L and text-bison are used as scorers LLM.

The evaluation benchmark GSM8K is about primary school mathematics, with 7473 training samples and 1319 test samples; the Big-Bench Hard (BBH) benchmark covers a wide range of topics beyond arithmetic reasoning , including symbolic manipulation and common sense reasoning.

GSM8K results

Figure 1 (a) shows the use of pre-trained PaLM 2-L as the scorer and PaLM 2-L-IT as the optimizer's instant optimization curve, it can be observed that the optimization curve shows an overall upward trend, with several jumps occurring throughout the optimization process:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Next, this article shows the results of using the text-bison scorer and the PaLM 2-L-IT optimizer to generate the Q_begin instruction. This article starts with an empty instruction. The training accuracy at this time is 57.1, and then the training Accuracy starts to increase. The optimization curve in Figure 4(a) shows a similar upward trend, during which there are some leaps in training accuracy:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

BBH Result

Figure 5 visually shows the difference in accuracy of each task for all 23 BBH tasks compared with the "let's think step by step" instruction. Shows that OPRO finds instructions better than "let's think step by step". There is a big advantage on almost all tasks: the instructions found in this paper outperformed it by more than 5% on 19/23 tasks using the PaLM 2-L grader and on 15/23 tasks using the text-bison grader.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Similar to GSM8K, this paper observes that the optimization curves of almost all BBH tasks show an upward trend, as shown in Figure 6.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

The above is the detailed content of DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles