Meta官方的Prompt工程指南：Llama 2這樣用更有效率-人工智慧-PHP中文網

首頁

科技週邊

人工智慧

Meta官方的Prompt工程指南：Llama 2這樣用更有效率

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jan 30, 2024 pm 10:51 PM

模型訓練

隨著大型語言模型（LLM）技術日漸成熟，提示工程（Prompt Engineering）變得越來越重要。一些研究機構發布了 LLM 提示工程指南，包括微軟、OpenAI 等等。

最近，Meta 提供了一個互動式提示工程指南，專門針對他們的 Llama 2 開源模型。這份指南涵蓋了使用 Llama 2 進行快速工程和最佳實踐的知識。

Meta官方的Prompt工程指南：Llama 2这样用更高效

以下是這份指南的核心內容。

Llama 模型

2023 年，Meta 推出了 Llama 、Llama 2 模型。較小的模型部署和運行成本較低，而更大的模型能力更強。

Llama 2 系列模型參數規模如下：

Meta官方的Prompt工程指南：Llama 2这样用更高效

#Code Llama 是以程式碼為中心的LLM，建立在Llama 2 的基礎上，也有各種參數規模和微調變體：

Meta官方的Prompt工程指南：Llama 2这样用更高效

部署LLM

LLM 可以透過多種方式部署和訪問，包括：

自託管（Self-hosting）：使用本地硬體來運行推理，例如使用llama.cpp 在Macbook Pro 上執行Llama 2。優點：自架最適合有隱私 / 安全需求的情況，或您有足夠的 GPU。

雲端託管：依靠雲端供應商來部署託管特定模型的實例，例如透過 AWS、Azure、GCP 等雲端供應商來運行 Llama 2。優點：雲端託管是最適合自訂模型及其運行時的方式。

託管 API：透過 API 直接呼叫 LLM。有許多公司提供 Llama 2 推理 API，包括 AWS Bedrock、Replicate、Anyscale、Together 等。優點：託管 API 是整體上最簡單的選擇。

託管API

#託管API 通常有兩個主要端點（endpoint）：

1. completion：產生對給定prompt 的回應。

2. chat_completion：產生訊息清單中的下一則訊息，為聊天機器人等使用案例提供更明確的指令和上下文。

token

#LLM 以稱為token 的區塊的形式來處理輸入和輸出，每個模型都有自己的tokenization 方案。例如下面這句話：

Our destiny is written in the stars.

Llama 2 的tokenization 為["our", "dest", "iny", "is", "writing", "in", "the", "stars"]。考慮 API 定價和內部行為（例如超參數）時，token 顯得特別重要。每個模型都有一個 prompt 不能超過的最大上下文長度，Llama 2 是 4096 個 token，而 Code Llama 是 100K 個 token。

Notebook 設定

作為範例，我們使用 Replicate 呼叫 Llama 2 chat，並使用 LangChain 輕鬆設定 chat completion API。

首先安裝先決條件：

pip install langchain replicate

from typing import Dict, Listfrom langchain.llms import Replicatefrom langchain.memory import ChatMessageHistoryfrom langchain.schema.messages import get_buffer_stringimport os# Get a free API key from https://replicate.com/account/api-tokensos.environ ["REPLICATE_API_TOKEN"] = "YOUR_KEY_HERE"LLAMA2_70B_CHAT = "meta/llama-2-70b-chat:2d19859030ff705a87c746f7e96eea03aefb71f166725aee39692f1476566d48"LLAMA2_13B_CHAT = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"# We'll default to the smaller 13B model for speed; change to LLAMA2_70B_CHAT for more advanced (but slower) generationsDEFAULT_MODEL = LLAMA2_13B_CHATdef completion (prompt: str,model: str = DEFAULT_MODEL,temperature: float = 0.6,top_p: float = 0.9,) -> str:llm = Replicate (model=model,model_kwargs={"temperature": temperature,"top_p": top_p, "max_new_tokens": 1000})return llm (prompt)def chat_completion (messages: List [Dict],model = DEFAULT_MODEL,temperature: float = 0.6,top_p: float = 0.9,) -> str:history = ChatMessageHistory ()for message in messages:if message ["role"] == "user":history.add_user_message (message ["content"])elif message ["role"] == "assistant":history.add_ai_message (message ["content"])else:raise Exception ("Unknown role")return completion (get_buffer_string (history.messages,human_prefix="USER",ai_prefix="ASSISTANT",),model,temperature,top_p,)def assistant (content: str):return { "role": "assistant", "content": content }def user (content: str):return { "role": "user", "content": content }def complete_and_print (prompt: str, model: str = DEFAULT_MODEL):print (f'==============\n {prompt}\n==============')response = completion (prompt, model)print (response, end='\n\n')

Completion API

complete_and_print ("The typical color of the sky is:")

complete_and_print ("which model version are you?")

Chat Completion 模型提供了與LLM 互動的額外結構，將結構化訊息物件陣列而不是單一文字傳送到LLM。此訊息清單為 LLM 提供了一些可以繼續進行的「背景」或「歷史」資訊。

通常，每個訊息都包含角色和內容：

具有系統角色的訊息用於開發人員向 LLM 提供核心指令。

具有使用者角色的訊息通常是人工提供的訊息。

具有助手角色的訊息通常由 LLM 產生。

response = chat_completion (messages=[user ("My favorite color is blue."),assistant ("That's great to hear!"),user ("What is my favorite color?"),])print (response)# "Sure, I can help you with that! Your favorite color is blue."

LLM 超參數

LLM API 通常会采用影响输出的创造性和确定性的参数。在每一步中，LLM 都会生成 token 及其概率的列表。可能性最小的 token 会从列表中「剪切」（基于 top_p），然后从剩余候选者中随机（温度参数 temperature）选择一个 token。换句话说：top_p 控制生成中词汇的广度，温度控制词汇的随机性，温度参数 temperature 为 0 会产生几乎确定的结果。

def print_tuned_completion (temperature: float, top_p: float):response = completion ("Write a haiku about llamas", temperature=temperature, top_p=top_p)print (f'[temperature: {temperature} | top_p: {top_p}]\n {response.strip ()}\n')print_tuned_completion (0.01, 0.01)print_tuned_completion (0.01, 0.01)# These two generations are highly likely to be the sameprint_tuned_completion (1.0, 1.0)print_tuned_completion (1.0, 1.0)# These two generations are highly likely to be different

prompt 技巧

详细、明确的指令会比开放式 prompt 产生更好的结果：

complete_and_print (prompt="Describe quantum physics in one short sentence of no more than 12 words")# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously.

我们可以给定使用规则和限制，以给出明确的指令。

风格化，例如：

向我解释一下这一点，就像儿童教育网络节目中教授小学生一样；
我是一名软件工程师，使用大型语言模型进行摘要。用 250 字概括以下文字；
像私家侦探一样一步步追查案件，给出你的答案。

格式化
使用要点；
以 JSON 对象形式返回；
使用较少的技术术语并用于工作交流中。
限制

仅使用学术论文；
切勿提供 2020 年之前的来源；
如果你不知道答案，就说你不知道。

以下是给出明确指令的例子：

complete_and_print ("Explain the latest advances in large language models to me.")# More likely to cite sources from 2017complete_and_print ("Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.")# Gives more specific advances and only cites sources from 2020

零样本 prompting

一些大型语言模型（例如 Llama 2）能够遵循指令并产生响应，而无需事先看过任务示例。没有示例的 prompting 称为「零样本 prompting（zero-shot prompting）」。例如：

complete_and_print ("Text: This was the best movie I've ever seen! \n The sentiment of the text is:")# Returns positive sentimentcomplete_and_print ("Text: The director was trying too hard. \n The sentiment of the text is:")# Returns negative sentiment

少样本 prompting

添加所需输出的具体示例通常会产生更加准确、一致的输出。这种方法称为「少样本 prompting（few-shot prompting）」。例如：

def sentiment (text):response = chat_completion (messages=[user ("You are a sentiment classifier. For each message, give the percentage of positive/netural/negative."),user ("I liked it"),assistant ("70% positive 30% neutral 0% negative"),user ("It could be better"),assistant ("0% positive 50% neutral 50% negative"),user ("It's fine"),assistant ("25% positive 50% neutral 25% negative"),user (text),])return responsedef print_sentiment (text):print (f'INPUT: {text}')print (sentiment (text))print_sentiment ("I thought it was okay")# More likely to return a balanced mix of positive, neutral, and negativeprint_sentiment ("I loved it!")# More likely to return 100% positiveprint_sentiment ("Terrible service 0/10")# More likely to return 100% negative

Role Prompting

Llama 2 在指定角色时通常会给出更一致的响应，角色为 LLM 提供了所需答案类型的背景信息。

例如，让 Llama 2 对使用 PyTorch 的利弊问题创建更有针对性的技术回答：

complete_and_print ("Explain the pros and cons of using PyTorch.")# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curvecomplete_and_print ("Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.")# Often results in more technical benefits and drawbacks that provide more technical details on how model layers

思维链

简单地添加一个「鼓励逐步思考」的短语可以显著提高大型语言模型执行复杂推理的能力（Wei et al. (2022)），这种方法称为 CoT 或思维链 prompting：

complete_and_print ("Who lived longer Elvis Presley or Mozart?")# Often gives incorrect answer of "Mozart"complete_and_print ("Who lived longer Elvis Presley or Mozart? Let's think through this carefully, step by step.")# Gives the correct answer "Elvis"

自洽性（Self-Consistency）

LLM 是概率性的，因此即使使用思维链，一次生成也可能会产生不正确的结果。自洽性通过从多次生成中选择最常见的答案来提高准确性（以更高的计算成本为代价）：

import refrom statistics import modedef gen_answer ():response = completion ("John found that the average of 15 numbers is 40.""If 10 is added to each number then the mean of the numbers is?""Report the answer surrounded by three backticks, for example:```123```",model = LLAMA2_70B_CHAT)match = re.search (r'```(\d+)```', response)if match is None:return Nonereturn match.group (1)answers = [gen_answer () for i in range (5)]print (f"Answers: {answers}\n",f"Final answer: {mode (answers)}",)# Sample runs of Llama-2-70B (all correct):# [50, 50, 750, 50, 50]-> 50# [130, 10, 750, 50, 50] -> 50# [50, None, 10, 50, 50] -> 50

检索增强生成

有时我们可能希望在应用程序中使用事实知识，那么可以从开箱即用（即仅使用模型权重）的大模型中提取常见事实：

complete_and_print ("What is the capital of the California?", model = LLAMA2_70B_CHAT)# Gives the correct answer "Sacramento"

然而，LLM 往往无法可靠地检索更具体的事实或私人信息。模型要么声明它不知道，要么幻想出一个错误的答案：

complete_and_print ("What was the temperature in Menlo Park on December 12th, 2023?")# "I'm just an AI, I don't have access to real-time weather data or historical weather records."complete_and_print ("What time is my dinner reservation on Saturday and what should I wear?")# "I'm not able to access your personal information [..] I can provide some general guidance"

检索增强生成（RAG）是指在 prompt 中包含从外部数据库检索的信息（Lewis et al. (2020)）。RAG 是将事实纳入 LLM 应用的有效方法，并且比微调更经济实惠，微调可能成本高昂并对基础模型的功能产生负面影响。

MENLO_PARK_TEMPS = {"2023-12-11": "52 degrees Fahrenheit","2023-12-12": "51 degrees Fahrenheit","2023-12-13": "51 degrees Fahrenheit",}def prompt_with_rag (retrived_info, question):complete_and_print (f"Given the following information: '{retrived_info}', respond to: '{question}'")def ask_for_temperature (day):temp_on_day = MENLO_PARK_TEMPS.get (day) or "unknown temperature"prompt_with_rag (f"The temperature in Menlo Park was {temp_on_day} on {day}'",# Retrieved factf"What is the temperature in Menlo Park on {day}?",# User question)ask_for_temperature ("2023-12-12")# "Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit."ask_for_temperature ("2023-07-18")# "I'm not able to provide the temperature in Menlo Park on 2023-07-18 as the information provided states that the temperature was unknown."

程序辅助语言模型

LLM 本质上不擅长执行计算，例如：

complete_and_print ("""Calculate the answer to the following math problem:((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))""")# Gives incorrect answers like 92448, 92648, 95463

Gao et al. (2022) 提出「程序辅助语言模型（Program-aided Language Models，PAL）」的概念。虽然 LLM 不擅长算术，但它们非常擅长代码生成。PAL 通过指示 LLM 编写代码来解决计算任务。

complete_and_print ("""# Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))""",model="meta/codellama-34b:67942fd0f55b66da802218a19a8f0e1d73095473674061a6ea19f2dc8c053152")

# The following code was generated by Code Llama 34B:num1 = (-5 + 93 * 4 - 0)num2 = (4**4 + -7 + 0 * 5)answer = num1 * num2print (answer)

以上是Meta官方的Prompt工程指南：Llama 2這樣用更有效率的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文轉載於：51CTO.COM。如有侵權，請聯絡admin@php.cn刪除

擁抱面部是否7B型號奧林匹克賽車擊敗克勞德3.7？Apr 23, 2025 am 11:49 AM

擁抱Face的OlympicCoder-7B：強大的開源代碼推理模型開發以代碼為中心的語言模型的競賽正在加劇，擁抱面孔與強大的競爭者一起參加了比賽：OlympicCoder-7B，一種產品

4個新的雙子座功能您可以錯過Apr 23, 2025 am 11:48 AM

你們當中有多少人希望AI可以做更多的事情，而不僅僅是回答問題？我知道我有，最近，我對它的變化感到驚訝。 AI聊天機器人不僅要聊天，還關心創建，研究

Camunda為經紀人AI編排編寫了新的分數Apr 23, 2025 am 11:46 AM

隨著智能AI開始融入企業軟件平台和應用程序的各個層面（我們必須強調的是，既有強大的核心工具，也有一些不太可靠的模擬工具），我們需要一套新的基礎設施能力來管理這些智能體。總部位於德國柏林的流程編排公司Camunda認為，它可以幫助智能AI發揮其應有的作用，並與新的數字工作場所中的準確業務目標和規則保持一致。該公司目前提供智能編排功能，旨在幫助組織建模、部署和管理AI智能體。從實際的軟件工程角度來看，這意味著什麼？確定性與非確定性流程的融合該公司表示，關鍵在於允許用戶（通常是數據科學家、軟件

策劃的企業AI體驗是否有價值？Apr 23, 2025 am 11:45 AM

參加Google Cloud Next '25，我渴望看到Google如何區分其AI產品。有關代理空間（此處討論）和客戶體驗套件（此處討論）的最新公告很有希望，強調了商業價值

如何為抹布找到最佳的多語言嵌入模型？Apr 23, 2025 am 11:44 AM

為您的檢索增強發電（RAG）系統選擇最佳的多語言嵌入模型在當今的相互聯繫的世界中，建立有效的多語言AI系統至關重要。強大的多語言嵌入模型對於RE至關重要

麝香：奧斯汀的機器人需要每10,000英里進行干預Apr 23, 2025 am 11:42 AM

特斯拉的Austin Robotaxi發射：仔細觀察Musk的主張埃隆·馬斯克（Elon Musk）最近宣布，特斯拉即將在德克薩斯州奧斯汀推出的Robotaxi發射，最初出於安全原因部署了一支小型10-20輛汽車，並有快速擴張的計劃。 h

AI震驚的樞軸：從工作工具到數字治療師和生活教練Apr 23, 2025 am 11:41 AM

人工智能的應用方式可能出乎意料。最初，我們很多人可能認為它主要用於代勞創意和技術任務，例如編寫代碼和創作內容。然而，哈佛商業評論最近報導的一項調查表明情況並非如此。大多數用戶尋求人工智能的並非是代勞工作，而是支持、組織，甚至是友誼！報告稱，人工智能應用案例的首位是治療和陪伴。這表明其全天候可用性以及提供匿名、誠實建議和反饋的能力非常有價值。另一方面，營銷任務（例如撰寫博客、創建社交媒體帖子或廣告文案）在流行用途列表中的排名要低得多。這是為什麼呢？讓我們看看研究結果及其對我們人類如何繼續將