使用 Anthropic 的 Claude Sonnet 產生報告-Python教學-PHP中文網

首頁

後端開發

Python教學

使用 Anthropic 的 Claude Sonnet 產生報告

Susan Sarandon

Jan 18, 2025 am 06:15 AM

利用Anthropic的Claude 3.5 Sonnet產生報表：兩種方法的比較

Using Anthropic

大家好！我是Raphael，巴西房地產公司Pilar的共同創辦人兼CTO。 Pilar為房地產經紀人和經紀公司提供軟體和服務，採用低成功費模式。我們不收取高昂的前期費用，而是從每次成功的交易中收取少量佣金，使我們的成功直接與客戶的成功掛鉤。我們由20位技術人員組成的團隊不斷創新，最新產品是Pilar Homes，一個全新的房地產入口網站，旨在為購屋者和房地產經紀人提供最佳體驗。

在這篇文章中，我將分享我們使用人工智慧產生報告的經驗，特別是Anthropic的Claude 3.5 Sonnet，並比較兩種不同的方法。

我們處理任務的理念將在未來的文章中詳細介紹（敬請關注！），但簡而言之，這些任務最終以Jira工單的形式出現在「技術服務台」看板上。產生報告就是這樣一項任務，大多數任務需要工程師花費大約30分鐘來解決，複雜報告很少超過幾個小時。但情況正在改變。我們最初只與一兩個合作夥伴合作的精品品牌正在擴張，成為更大的經紀公司，我們也與業內老牌公司簽訂了更多合約。雖然增加工程師的工作時間可以解決日益增長的報告需求，但我看到了探索人工智慧代理並在現實環境中學習架構模式的機會。

方法一：讓AI全權處理並達到max_tokens限制

在我們的初始方法中，我們將工具暴露給Claude的3.5 Sonnet模型，使其能夠執行資料庫查詢、將檢索到的文檔轉換為CSV並將其結果寫入.csv檔案。

以下是我們的結構，很大程度上受到了上面部落格文章的啟發：

<code># 每个collection对象描述一个MongoDB集合及其字段
# 这有助于Claude理解我们的数据模式
COLLECTIONS = [
    {
        'name': 'companies',
        'description': 'Companies are the real estate brokerages. If the user provides a code to filter the data, it will be a company code. The _id may be retrieved by querying the company with the given code. Company codes are not used to join data.',
        'fields': {
            '_id': 'The ObjectId is the MongoDB id that uniquely identifies a company document. Its JSON representation is \"{"$oid": "the id"}\"',
            'code': 'The company code is a short and human friendly string that uniquely identifies the company. Never use it for joining data.',
            'name': 'A string representing the company name',
        }
    },
    # 此处之后描述了更多集合，但思路相同...
]

# 这是client.messages.create的“system”参数
ROLE_PROMPT = "You are an engineer responsible for generating reports in CSV based on a user's description of the report content"

# 这是“user”消息
task_prompt = f"{report_description}.\nAvailable collections: {COLLECTIONS}\nCompany codes: {company_codes}\n.Always demand a company code from the user to filter the data -- the user may use the terms imobiliária, marca, brand or company to reference a company. If the user wants a field that does not exist in a collection, don't add it to the report and don't ask the user for the field."
</code>

report_description只是一個透過argparse讀取的命令列參數，company_codes是從資料庫中檢索到的，並將其暴露給模型，以便它知道哪些公司存在以及用戶輸入中什麼是公司代碼。範例：（MO - Mosaic Homes，NV - Nova Real Estate，等等）。

模型可用的工具包括：find和docs2csv。

<code>def find(collection: str, query: str, fields: list[str]) -> Cursor:
    """Find documents in a collection filtering by "query" and retrieving fields via projection"""
    return db.get_collection(collection).find(query, projection={field: 1 for field in fields})

def docs2csv(documents: list[dict]) -> list[str]:
    """
    Convert a dictionary to a CSV string.
    """
    print(f"Converting {len(documents)} documents to CSV")
    with open('report.csv', mode='w', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=documents[0].keys())
        writer.writeheader()
        writer.writerows(documents)
    return "report.csv"</code>

Claude能夠呼叫find函數對我們的資料庫執行結構良好的查詢和投影，並使用docs2csv工具產生小型CSV報告（少於500行）。但是，較大的報告會觸發max_tokens錯誤。

在分析了我們的令牌使用模式後，我們意識到大部分令牌消耗都來自透過模型處理單一記錄。這促使我們探索另一種方法：讓Claude產生處理程式碼，而不是直接處理資料。

方法二：Python程式碼產生作為解決方法

雖然解決max_tokens限制在技術上並不困難，但它需要我們重新思考解決問題的方法。

解決方案？讓Claude產生將在我們的CPU上運行的Python程式碼，而不是透過AI處理每個文件。

我必須修改角色和任務提示並刪除工具。

以下是報告產生程式碼的要點。

產生報告的命令是：

<code># 每个collection对象描述一个MongoDB集合及其字段
# 这有助于Claude理解我们的数据模式
COLLECTIONS = [
    {
        'name': 'companies',
        'description': 'Companies are the real estate brokerages. If the user provides a code to filter the data, it will be a company code. The _id may be retrieved by querying the company with the given code. Company codes are not used to join data.',
        'fields': {
            '_id': 'The ObjectId is the MongoDB id that uniquely identifies a company document. Its JSON representation is \"{"$oid": "the id"}\"',
            'code': 'The company code is a short and human friendly string that uniquely identifies the company. Never use it for joining data.',
            'name': 'A string representing the company name',
        }
    },
    # 此处之后描述了更多集合，但思路相同...
]

# 这是client.messages.create的“system”参数
ROLE_PROMPT = "You are an engineer responsible for generating reports in CSV based on a user's description of the report content"

# 这是“user”消息
task_prompt = f"{report_description}.\nAvailable collections: {COLLECTIONS}\nCompany codes: {company_codes}\n.Always demand a company code from the user to filter the data -- the user may use the terms imobiliária, marca, brand or company to reference a company. If the user wants a field that does not exist in a collection, don't add it to the report and don't ask the user for the field."
</code>

Claude產生的Python內容（運作良好）：

<code>def find(collection: str, query: str, fields: list[str]) -> Cursor:
    """Find documents in a collection filtering by "query" and retrieving fields via projection"""
    return db.get_collection(collection).find(query, projection={field: 1 for field in fields})

def docs2csv(documents: list[dict]) -> list[str]:
    """
    Convert a dictionary to a CSV string.
    """
    print(f"Converting {len(documents)} documents to CSV")
    with open('report.csv', mode='w', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=documents[0].keys())
        writer.writeheader()
        writer.writerows(documents)
    return "report.csv"</code>

結論

我們與Claude 3.5 Sonnet的歷程表明，人工智慧可以顯著提高營運效率，但成功的關鍵在於選擇正確的架構。程式碼產生方法被證明比直接的AI處理更強大，同時保持了自動化的優勢。

除了正確建立報告外，程式碼產生方法還允許工程師審查AI的工作，這是一件非常好的事情。

為了完全自動化流程，消除人工參與並處理更大數量的報告，跨多個代理實例分配工作——每個實例處理更少的令牌——將是該系統的自然演變。對於這類分散式AI系統中的架構挑戰，我強烈推薦Phil Calçado關於建構AI產品的最新文章。

此實現的主要經驗教訓：

直接AI處理適用於較小的資料集
程式碼產生提供更好的可擴充性和可維護性
人工審查增加了可靠性

參考文獻

Anthropic 文件
Thomas Taylor 使用 Python SDK 的帶工具的 Anthropic Claude
Phil Calçado 所寫的建構 AI 產品－第一部分：後端架構

以上是使用 Anthropic 的 Claude Sonnet 產生報告的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

列表和陣列之間的選擇如何影響涉及大型數據集的Python應用程序的整體性能？May 03, 2025 am 12:11 AM

ForhandlinglargedatasetsinPython,useNumPyarraysforbetterperformance.1)NumPyarraysarememory-efficientandfasterfornumericaloperations.2)Avoidunnecessarytypeconversions.3)Leveragevectorizationforreducedtimecomplexity.4)Managememoryusagewithefficientdata

說明如何將內存分配給Python中的列表與數組。May 03, 2025 am 12:10 AM

Inpython，ListSusedynamicMemoryAllocationWithOver-Asalose，而alenumpyArraySallaySallocateFixedMemory.1）listssallocatemoremoremoremorythanneededinentientary上，respizeTized.2）numpyarsallaysallaysallocateAllocateAllocateAlcocateExactMemoryForements，OfferingPrediCtableSageButlessemageButlesseflextlessibility。

您如何在Python數組中指定元素的數據類型？May 03, 2025 am 12:06 AM

Inpython，YouCansspecthedatatAtatatPeyFelemereModeRernSpant.1）Usenpynernrump.1）Usenpynyp.dloatp.dloatp.ploatm64，formor professisconsiscontrolatatypes。

什麼是Numpy，為什麼對於Python中的數值計算很重要？May 03, 2025 am 12:03 AM

NumPyisessentialfornumericalcomputinginPythonduetoitsspeed,memoryefficiency,andcomprehensivemathematicalfunctions.1)It'sfastbecauseitperformsoperationsinC.2)NumPyarraysaremorememory-efficientthanPythonlists.3)Itoffersawiderangeofmathematicaloperation

討論'連續內存分配”的概念及其對數組的重要性。May 03, 2025 am 12:01 AM

Contiguousmemoryallocationiscrucialforarraysbecauseitallowsforefficientandfastelementaccess.1)Itenablesconstanttimeaccess,O(1),duetodirectaddresscalculation.2)Itimprovescacheefficiencybyallowingmultipleelementfetchespercacheline.3)Itsimplifiesmemorym

您如何切成python列表？May 02, 2025 am 12:14 AM

SlicingaPythonlistisdoneusingthesyntaxlist[start:stop:step].Here'showitworks:1)Startistheindexofthefirstelementtoinclude.2)Stopistheindexofthefirstelementtoexclude.3)Stepistheincrementbetweenelements.It'susefulforextractingportionsoflistsandcanuseneg

在Numpy陣列上可以執行哪些常見操作？May 02, 2025 am 12:09 AM

numpyallowsforvariousoperationsonArrays：1）basicarithmeticlikeaddition，減法，乘法和division; 2）evationAperationssuchasmatrixmultiplication; 3）element-wiseOperations wiseOperationswithOutexpliitloops; 4）

Python的數據分析中如何使用陣列？May 02, 2025 am 12:09 AM

Arresinpython，尤其是Throughnumpyandpandas，weessentialFordataAnalysis，offeringSpeedAndeffied.1）NumpyArseNable efflaysenable efficefliceHandlingAtaSetSetSetSetSetSetSetSetSetSetSetsetSetSetSetSetsopplexoperationslikemovingaverages.2）

See all articles