Anthropic&#s Claude Sonnet を使用したレポートの生成-Python チュートリアル-php.cn

ホームページ

バックエンド開発

Python チュートリアル

Anthropic&#s Claude Sonnet を使用したレポートの生成

Susan Sarandon

Jan 18, 2025 am 06:15 AM

Anthropic の Claude 3.5 Sonnet を使用したレポート生成: 2 つの方法の比較

Using Anthropic

皆さん、こんにちは！私はブラジルの不動産会社 Pilar の共同創設者兼 CTO の Raphael です。 Pilar は、低成功報酬モデルを使用して、不動産業者や仲介会社にソフトウェアとサービスを提供しています。高額な前払い料金を請求する代わりに、成功した取引ごとに少額の手数料を受け取り、私たちの成功をクライアントの成功に直接結びつけます。 20 人の技術者からなる当社のチームは常に革新を続けており、最新の製品は、住宅購入者と不動産業者に最高のエクスペリエンスを提供するように設計された新しい不動産ポータルである Pilar Homes です。

この投稿では、人工知能を使用してレポートを生成した経験、特に Anthropic の Claude 3.5 Sonnet を使用した経験を共有し、2 つの異なる方法を比較します。

タスクの処理に関する私たちの哲学については今後の記事で詳しく説明します (お楽しみに!) が、要するに、これらのタスクは最終的に Jira チケットとして「Tech Help Desk」ボードに置かれます。レポートの生成もそのようなタスクの 1 つで、ほとんどのタスクはエンジニアが解決するのに約 30 分かかりますが、複雑なレポートに数時間以上かかることはほとんどありません。しかし、状況は変わりつつあります。私たちが 1 社または 2 社のパートナーと始めたブティックブランドは、より大きな代理店に拡大しており、業界の確立されたプレーヤーとさらに多くの契約を結んでいます。エンジニアの勤務時間を増やすことで、増大するレポート作成ニーズに対応できる一方で、AI エージェントを探索し、現実世界の環境でアーキテクチャパターンを学習する機会があると考えました。

方法 1: AI に完全に処理させ、max_tokens 制限に到達させます

最初のアプローチでは、このツールを Claude の 3.5 Sonnet モデルに公開し、データベースクエリの実行、取得したドキュメントの CSV への変換、およびその結果の .csv ファイルへの書き込みを可能にしました。

これが私たちの構造です。上記のブログ投稿から大きく影響を受けています。

<code># 每个collection对象描述一个MongoDB集合及其字段
# 这有助于Claude理解我们的数据模式
COLLECTIONS = [
    {
        'name': 'companies',
        'description': 'Companies are the real estate brokerages. If the user provides a code to filter the data, it will be a company code. The _id may be retrieved by querying the company with the given code. Company codes are not used to join data.',
        'fields': {
            '_id': 'The ObjectId is the MongoDB id that uniquely identifies a company document. Its JSON representation is \"{"$oid": "the id"}\"',
            'code': 'The company code is a short and human friendly string that uniquely identifies the company. Never use it for joining data.',
            'name': 'A string representing the company name',
        }
    },
    # 此处之后描述了更多集合，但思路相同...
]

# 这是client.messages.create的“system”参数
ROLE_PROMPT = "You are an engineer responsible for generating reports in CSV based on a user's description of the report content"

# 这是“user”消息
task_prompt = f"{report_description}.\nAvailable collections: {COLLECTIONS}\nCompany codes: {company_codes}\n.Always demand a company code from the user to filter the data -- the user may use the terms imobiliária, marca, brand or company to reference a company. If the user wants a field that does not exist in a collection, don't add it to the report and don't ask the user for the field."
</code>

report_description は argparse 経由で読み取られる単なるコマンドライン引数であり、company_codes はデータベースから取得されてモデルに公開されるため、どの企業が存在し、どのような企業コードがユーザー入力に含まれているかがわかります。例: (ミズーリ州 - モザイクホームズ、ネバダ州 - ノバリアルエステートなど)。

モデルで使用できるツールには、find および docs2csv が含まれます。

<code>def find(collection: str, query: str, fields: list[str]) -> Cursor:
    """Find documents in a collection filtering by "query" and retrieving fields via projection"""
    return db.get_collection(collection).find(query, projection={field: 1 for field in fields})

def docs2csv(documents: list[dict]) -> list[str]:
    """
    Convert a dictionary to a CSV string.
    """
    print(f"Converting {len(documents)} documents to CSV")
    with open('report.csv', mode='w', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=documents[0].keys())
        writer.writeheader()
        writer.writerows(documents)
    return "report.csv"</code>

クロードは、find 関数を呼び出してデータベースに対して適切に構造化されたクエリと予測を実行し、docs2csv ツールを使用して小さな CSV レポート (500 行未満) を生成することができました。ただし、レポートが大きくなると max_tokens エラーが発生します。

トークンの使用パターンを分析した結果、トークン消費の大部分はモデルを介した個々のレコードの処理から来ていることがわかりました。このため、私たちは別のアプローチを検討するようになりました。それは、データを直接処理する代わりに、クロードに処理コードを生成させるというものでした。

方法 2: ソリューションとしての Python コード生成

max_tokens 制限を解決することは技術的には難しくありませんが、問題を解決するアプローチを再考する必要があります。

解決策? AI を介して各ドキュメントを処理する代わりに、Claude に CPU で実行される Python コードを生成させます。

キャラクターとクエストのプロンプトを変更し、ツールを削除する必要がありました。

以下はレポート生成コードの要点です。

レポートを生成するコマンドは次のとおりです:

<code># 每个collection对象描述一个MongoDB集合及其字段
# 这有助于Claude理解我们的数据模式
COLLECTIONS = [
    {
        'name': 'companies',
        'description': 'Companies are the real estate brokerages. If the user provides a code to filter the data, it will be a company code. The _id may be retrieved by querying the company with the given code. Company codes are not used to join data.',
        'fields': {
            '_id': 'The ObjectId is the MongoDB id that uniquely identifies a company document. Its JSON representation is \"{"$oid": "the id"}\"',
            'code': 'The company code is a short and human friendly string that uniquely identifies the company. Never use it for joining data.',
            'name': 'A string representing the company name',
        }
    },
    # 此处之后描述了更多集合，但思路相同...
]

# 这是client.messages.create的“system”参数
ROLE_PROMPT = "You are an engineer responsible for generating reports in CSV based on a user's description of the report content"

# 这是“user”消息
task_prompt = f"{report_description}.\nAvailable collections: {COLLECTIONS}\nCompany codes: {company_codes}\n.Always demand a company code from the user to filter the data -- the user may use the terms imobiliária, marca, brand or company to reference a company. If the user wants a field that does not exist in a collection, don't add it to the report and don't ask the user for the field."
</code>

Claude が生成した Python コンテンツ (正常に動作):

<code>def find(collection: str, query: str, fields: list[str]) -> Cursor:
    """Find documents in a collection filtering by "query" and retrieving fields via projection"""
    return db.get_collection(collection).find(query, projection={field: 1 for field in fields})

def docs2csv(documents: list[dict]) -> list[str]:
    """
    Convert a dictionary to a CSV string.
    """
    print(f"Converting {len(documents)} documents to CSV")
    with open('report.csv', mode='w', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=documents[0].keys())
        writer.writeheader()
        writer.writerows(documents)
    return "report.csv"</code>

結論

Claude 3.5 Sonnet との取り組みから、AI は業務効率を大幅に向上させることができますが、成功の鍵は適切なアーキテクチャを選択することにあることがわかりました。コード生成アプローチは、自動化の利点を維持しながら、直接 AI 処理よりも強力であることが証明されました。

このコード生成方法では、レポートを正しく作成できるだけでなく、エンジニアが AI の動作をレビューすることもできます。これは非常に優れています。

プロセスを完全に自動化し、人間の関与を排除し、大量のレポートを処理するには、複数のエージェントインスタンス間で作業を分散し、それぞれが処理するトークンの数を減らすことが、システムの自然な進化です。このような分散 AI システムにおけるアーキテクチャ上の課題については、AI 製品の構築に関する Phil Calçado の最新記事を強くお勧めします。

この実装から学んだ主な教訓:

直接 AI 処理は小規模なデータセットに対して機能します
コード生成により、拡張性と保守性が向上します
人間によるレビューにより信頼性が向上します

参考文献

人類のドキュメント
Thomas Taylor の Anthropic Claude と Python SDK を使用したツール
AI 製品の構築 - パート 1: バックエンドアーキテクチャ by Phil Calçado

以上がAnthropic&#s Claude Sonnet を使用したレポートの生成の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

リストと配列の選択は、大規模なデータセットを扱うPythonアプリケーションの全体的なパフォーマンスにどのように影響しますか？May 03, 2025 am 12:11 AM

forhandlinglaredataSetsinpython、usenumpyArrays forbetterperformance.1）numpyarraysarememory-effictientandfasterfornumericaloperations.2）nusinnnnedarytypeconversions.3）レバレッジベクトル化は、測定済みのマネージメーシェイメージーウェイズデイタイです

Pythonのリストと配列にメモリがどのように割り当てられるかを説明します。May 03, 2025 am 12:10 AM

inpython、listsusedynamicmemoryallocation with allocation、whilenumpyArraysalocatefixedmemory.1）listsallocatemorememorythanneededededinitivative.2）numpyArrayasallocateexactmemoryforements、rededicablebutlessflexibilityを提供します。

Pythonアレイ内の要素のデータ型をどのように指定しますか？May 03, 2025 am 12:06 AM

inpython、youcanspecthedatatypeyfelemeremodelernspant.1）usenpynernrump.1）usenpynerp.dloatp.ploatm64、フォーマーpreciscontrolatatypes。

Numpyとは何ですか、そしてなぜPythonの数値コンピューティングにとって重要なのですか？May 03, 2025 am 12:03 AM

numpyisessentialfornumericalcomputinginpythonduetoitsspeed、memory efficiency、andcomprehensivematicalfunctions.1）それは、performsoperations.2）numpyArraysaremoremory-efficientthanpythonlists.3）Itofderangeofmathematicaloperty

「隣接するメモリ割り当て」の概念と、配列にとってその重要性について説明します。May 03, 2025 am 12:01 AM

contiguousMemoryAllocationisucial forArraysは、ForeffienceAndfastelementAccess.1）iteenablesConstantTimeAccess、O（1）、DuetodirectAddresscalculation.2）itemprovesefficiencyByAllowingMultiblementFechesperCacheLine.3）itimplifieMememm

Pythonリストをどのようにスライスしますか？May 02, 2025 am 12:14 AM

slicingapythonlistisdoneusingtheyntaxlist [start：stop：step] .hore'showitworks：1）startisthe indexofthefirstelementtoinclude.2）spotisthe indexofthefirmenttoeexclude.3）staptistheincrementbetbetinelements

Numpyアレイで実行できる一般的な操作は何ですか？May 02, 2025 am 12:09 AM

numpyallows forvariousoperationsonarrays：1）basicarithmeticlikeaddition、減算、乗算、および分割; 2）AdvancedperationssuchasmatrixMultiplication;

Pythonを使用したデータ分析では、配列はどのように使用されていますか？May 02, 2025 am 12:09 AM

Arraysinpython、特にnumpyandpandas、aresentialfordataanalysis、offeringspeedandeficiency.1）numpyarraysenable numpyarraysenable handling forlaredatasents andcomplexoperationslikemoverages.2）Pandasextendsnumpy'scapabivitieswithdataframesfortruc

See all articles

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

メモ帳++7.3.1

使いやすく無料のコードエディター

SublimeText3 Linux 新バージョン

SublimeText3 Linux 最新バージョン

VSCode Windows 64 ビットのダウンロード

Microsoft によって発売された無料で強力な IDE エディター

SAP NetWeaver Server Adapter for Eclipse

Eclipse を SAP NetWeaver アプリケーションサーバーと統合します。

mPDF

mPDF は、UTF-8 でエンコードされた HTML から PDF ファイルを生成できる PHP ライブラリです。オリジナルの作者である Ian Back は、Web サイトから「オンザフライ」で PDF ファイルを出力し、さまざまな言語を処理するために mPDF を作成しました。 HTML2FPDF などのオリジナルのスクリプトよりも遅く、Unicode フォントを使用すると生成されるファイルが大きくなりますが、CSS スタイルなどをサポートし、多くの機能強化が施されています。 RTL (アラビア語とヘブライ語) や CJK (中国語、日本語、韓国語) を含むほぼすべての言語をサポートします。ネストされたブロックレベル要素 (P、DIV など) をサポートします。