首頁 >科技週邊 >人工智慧 >使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器

使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器

William Shakespeare原創: 2025-03-08 11:30:15887瀏覽

您是否曾經發現很難理解一個雜亂的代碼庫？還是想知道分析和探索代碼的工具實際上是如何工作的？在本文中，我們將通過從頭開始構建強大的代碼庫探索工具來解決這些問題。使用靜態代碼分析和Gemini模型，我們將創建一個易於使用的系統，可幫助開發人員從其代碼中查詢，理解和獲得有用的見解。準備更改您導航代碼的方式了嗎？讓我們開始！

學習目標

使用面向對象的編程的複雜軟件開發。
>如何使用AST或抽象語法樹解析和分析Python代碼庫。
了解如何將Google的Google的Gemini LLM API與Python應用代碼分析相結合。
> typer命令行的基於CodeBase探索的查詢系統。 >

>本文是> > data Science Blogathon的一部分。目錄的目錄

>

架構概述概述
啟動動手project project
設置項目環境
>測試應用程序
未來開發
結論
>常見問題
>
體系結構概述

該工具由四個主要組件

組成

代碼解析器：它是我們系統的基礎，它負責分析Python文件並使用Python的抽象語法樹（AST）模塊提取其結構。它標識類，方法，功能和導入。它將創建代碼庫的綜合圖。
>
Gemini客戶端：圍繞Google的雙子座API包裝器，可處理與LLM模型的通信。這些組件管理API身份驗證，並提供了一個乾淨的接口，用於發送查詢和接收響應。

QUERY處理器：是該工具的主要引擎，負責以Gemini可以有效理解和處理的方式格式化代碼庫上下文和查詢。它保持代碼庫結構的持續索引，並管理解析器與LLM之間的相互作用。
CLI接口：
>用typer構建的用戶友好命令行接口，為索引代碼庫，查詢代碼結構和分析堆棧痕跡提供命令。 >>>>>> >開始動手項目

>本節將指導您完成構建和實施項目的初始步驟，以確保啟動和有效的學習經驗。

項目文件夾結構

項目文件夾結構將類似於這些

設置項目環境

在以下步驟中

設置項目環境：

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

安裝所有必要的庫：

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

實施代碼

我們將從理解和實施代碼庫解析系統開始。它具有兩個重要的功能

pip install google-generativeai google-ai-generativelanguage
pip install python-dotenv typer llama-index

> parse_codebase（）

extract_definitions（）

這是PARSE_CODEBASE（）的輔助函數。它將採用Python文件的抽象語法樹（AST）。該函數啟動了一個詞典，其中包含用於類，功能和導入的空列表。現在，Ast.Walk（）通過AST樹中的所有節點迭代。 AST模塊將標識所有類，功能，導入和線數。然後將所有定義附加到定義字典中。

>解析代碼庫

import ast
import os
from typing import Dict, Any

def extract_definitions(tree: ast.AST) -> Dict[str, list]:
    """Extract class and function definitions from AST."""
    definitions = {
        "classes": [],
        "functions": [],
        "imports": []
    }
    
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            definitions["classes"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.FunctionDef):
            definitions["functions"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.Import):
            for name in node.names:
                definitions["imports"].append(name.name)
    return definitions

此功能掃描python文件的目錄，讀取其內容並提取其結構。

函數以目錄路徑為字符串啟動。它輸出了代碼結構的字典。該字典存儲每個Python文件的提取數據。

>它通過所有子目錄和給定目錄中的文件循環。

import ast
import os
from typing import Dict, Any

def parse_codebase(directory: str) -> Dict[str, Any]:
    """Parse Python files in the directory and extract code structure."""
    code_structure = {}
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".py"):
                file_path = os.path.join(root, file)
                with open(file_path, "r", encoding="utf-8") as f:
                    try:
                        content = f.read()
                        tree = ast.parse(content)
                        code_structure[file_path] = {
                            "definitions": extract_definitions(tree),
                            "content": content
                        }
                    except Exception as e:
                        print(f"Error parsing {file_path}: {e}")
    return code_structure

> os.walk（）

提供了一種瀏覽整個目錄樹的遞歸方法。它將處理結束.py擴展的文件。

使用Pythonast模塊將文件的內容解析到代表文件結構的抽象語法樹（AST）中。然後將提取的樹傳遞到extract_definitions（tree）。如果解析失敗，它將打印一條錯誤消息，但繼續處理其他文件。 >

>查詢處理引擎

在查詢引擎目錄中創建兩個名為gemini_client.py和query_processor.py

的文件

gemini客戶端

此文件將使用>＆lt>＆lt; >從Google驗證Gemini Model API。在項目的根部中，創建一個.env文件，然後將您的Gemini API鍵放入其中。獲取您的api_keyhere。在這裡，我們定義了A

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

geminiclient

類以與Google的Gemini AI模型進行交互。它將使用.ENV文件使用 google_api_key 來驗證模型。配置模型API後，它提供了一種查詢方法，可以在給定的提示符上生成響應。 >查詢處理系統

在本節中，我們將實現查詢過程類以管理代碼庫上下文並使用Gemini啟用查詢。

加載必要的庫後，

load_dotenv

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

（）加載來自。 gemini api api api api

鍵。 > GEMINIEMBEDING類從Google服務器初始化嵌入式-001型號。 > QueryProcessor類旨在處理代碼庫上下文並與geminiclient.loading_contextmethod進行交互。 > thesaving_contextmethod將當前的代碼庫上下文保存到JSON文件中以供persistence.save_contextmethod更新代碼庫上下文，並立即將其保存為usingsave_context和theefformat_contextmetext和theeformat_contextMethod，將代碼庫數據轉換為可讀取人類的字符串形式，以徵求人類的字符串for for for for Human-munther-formaT for gromand formaties forman-fime >查詢雙子座是最重要的方法，它將使用代碼庫上下文和用戶的查詢構建提示。它通過GeminicLient將此提示發送到Gemini模型並恢復響應。 >命令行應用程序實現（CLI）

>
>
>步驟2：初始化typer和Query處理器
>讓我們創建一個從類中創建一個 typer

>步驟3：索引Python項目目錄

>在這裡，

索引

方法將用作終端中的命令，該函數將在指定的目錄中索引Python代碼庫進行將來的查詢和分析。

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

>它將首先檢查目錄是否存在，然後使用parse_codebase 函數在目錄中提取Python文件的結構。

解析後，它將保存在的情況下保存解析的代碼庫結構。除了塊外，所有過程均在嘗試中，因此可以在解析過程中謹慎處理例外。它將準備使用Gemini模型的代碼庫進行高效查詢。 >步驟4：查詢代碼庫 索引後，我們可以查詢代碼庫以了解或獲取有關代碼庫中任何功能的信息。

首先，檢查是否已加載了代碼庫上下文，並嘗試從計算機硬盤加載上下文。然後使用

query_processor的

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

查詢方法來處理查詢。

和最後一個，它將使用typer.echo（）方法。 步驟5：運行應用程序

測試應用程序

測試您的辛勤工作，請按照以下步驟進行操作：

>在您的項目root中創建一個文件夾名稱索引，我們將在其中放置所有索引文件。

pip install google-generativeai google-ai-generativelanguage
pip install python-dotenv typer llama-index

>創建一個codebase_index.json並將其放入以前的（索引）創建的文件夾中。

然後在根部創建一個Project_test文件夾，我們將在其中存儲python文件進行測試

>在project_test文件夾中創建一個find_palidrome.py文件，然後將以下代碼放在文件中。

>代碼實現
>
>

索引項目

import ast
import os
from typing import Dict, Any

def extract_definitions(tree: ast.AST) -> Dict[str, list]:
    """Extract class and function definitions from AST."""
    definitions = {
        "classes": [],
        "functions": [],
        "imports": []
    }
    
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            definitions["classes"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.FunctionDef):
            definitions["functions"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.Import):
            for name in node.names:
                definitions["imports"].append(name.name)
    return definitions

輸出：

>您可以顯示成功索引1 python文件。而且JSON數據看起來像

import ast
import os
from typing import Dict, Any

def parse_codebase(directory: str) -> Dict[str, Any]:
    """Parse Python files in the directory and extract code structure."""
    code_structure = {}
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".py"):
                file_path = os.path.join(root, file)
                with open(file_path, "r", encoding="utf-8") as f:
                    try:
                        content = f.read()
                        tree = ast.parse(content)
                        code_structure[file_path] = {
                            "definitions": extract_definitions(tree),
                            "content": content
                        }
                    except Exception as e:
                        print(f"Error parsing {file_path}: {e}")
    return code_structure

>查詢項目

使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器

>輸出：

import os
from typing import Optional
from google import generativeai as genai
from dotenv import load_dotenv

load_dotenv()


class GeminiClient:
    def __init__(self):
        self.api_key = os.getenv("GOOGLE_API_KEY")
        if not self.api_key:
            raise ValueError("GOOGLE_API_KEY environment variable is not set")

        genai.configure(api_key=self.api_key)
        self.model = genai.GenerativeModel("gemini-1.5-flash")

    def query(self, prompt: str) -> Optional[str]:
        """Query Gemini with the given prompt."""
        try:
            response = self.model.generate_content(prompt)
            return response.text
        except Exception as e:
            print(f"Error querying Gemini: {e}")
            return None

輸出：

import os
import json
from llama_index.embeddings.gemini import GeminiEmbedding


from dotenv import load_dotenv
from typing import Dict, Any, Optional
from .gemini_client import GeminiClient

load_dotenv()

gemini_api_key = os.getenv("GOOGLE_API_KEY")
model_name = "models/embeddings-001"
embed_model = GeminiEmbedding(model_name=model_name, api_key=gemini_api_key)


class QueryProcessor:
    def __init__(self):
        self.gemini_client = GeminiClient()
        self.codebase_context: Optional[Dict[str, Any]] = None
        self.index_file = "./indexes/codebase_index.json"

    def load_context(self):
        """Load the codebase context from disk if it exists."""
        if os.path.exists(self.index_file):
            try:
                with open(self.index_file, "r", encoding="utf-8") as f:
                    self.codebase_context = json.load(f)
            except Exception as e:
                print(f"Error loading index: {e}")
                self.codebase_context = None

    def save_context(self):
        """Save the codebase context to disk."""
        if self.codebase_context:
            try:
                with open(self.index_file, "w", encoding="utf-8") as f:
                    json.dump(self.codebase_context, f, indent=2)
            except Exception as e:
                print(f"Error saving index: {e}")

    def set_context(self, context: Dict[str, Any]):
        """Set the codebase context for queries."""
        self.codebase_context = context
        self.save_context()

    def format_context(self) -> str:
        """Format the codebase context for Gemini."""
        if not self.codebase_context:
            return ""

        context_parts = []
        for file_path, details in self.codebase_context.items():
            defs = details["definitions"]
            context_parts.append(
                f"File: {file_path}\n"
                f"Classes: {[c['name'] for c in defs['classes']]}\n"
                f"Functions: {[f['name'] for f in defs['functions']]}\n"
                f"Imports: {defs['imports']}\n"
            )
        return "\n\n".join(context_parts)

    def query(self, query: str) -> Optional[str]:
        """Process a query about the codebase."""
        if not self.codebase_context:
            return (
                "Error: No codebase context available. Please index the codebase first."
            )

        prompt = f"""
        Given the following codebase structure:
        {self.format_context()}
        
        Query: {query}
        
        Please provide a detailed and accurate answer based on the codebase structure above.
        """
        return self.gemini_client.query(prompt)

使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器

如果一切都正確完成，您將在終端中獲取這些輸出。您可以使用Python代碼文件嘗試一下，並在評論部分告訴我您的輸出是什麼。謝謝你和我在一起。

import os
import json
import typer
from pathlib import Path
from typing import Optional
from indexer.code_parser import parse_codebase
from query_engine.query_processor import QueryProcessor

未來的發展這是基礎系統的原型，可以通過許多有趣的功能進行擴展，例如

您可以與IDE插件集成以進行無縫代碼探索。
>
實時代碼分析和LLM的改進建議。
>

結論

>代碼庫資源管理器可幫助您了解AI在軟件開發工具中的實際應用。通過將傳統的靜態分析與現代AI功能相結合，我們創建了一種工具，使代碼庫探索更加直觀和高效。這種方法顯示了AI如何在不替換現有工具的情況下增加開發人員的工作流程，從而為複雜代碼庫提供了新的理解和可訪問性。

本文中使用的所有代碼都在此。

鑰匙要點

結構代碼解析是代碼分析的最重要的技術。 >

代碼庫資源管理器簡化了代碼導航，允許開發人員快速理解和管理複雜的代碼結構。

代碼庫Explorer提高了調試效率，提供了分析依賴性和更快識別問題的工具。雙子座在與傳統靜態分析結合使用時可以顯著增強代碼理解。

CLI工具可以為LLM Assisted Code Exportoration提供強大的接口。 >

常見問題
q
Q
2。該工具可以離線工作嗎？代碼解析和索引管理可以離線工作，但是使用Gemini API查詢代碼庫需要Internet連接以與外部服務器進行通信。我們可以將Ollama與工具集成在一起，這些工具可以使用設備LLM或SLM模型（例如Llama3或Phi-3）來查詢代碼庫。
q

>本文所示的媒體不歸Analytics Vidhya擁有，並由作者的酌情決定使用。

以上是使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述：

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

上一篇：數據科學家的Databricks Lakehouse AI的綜合指南下一篇：數據科學家的Databricks Lakehouse AI的綜合指南

看更多

使用Google＆＃039; s gemini-2.0構建代碼庫資源管理器

學習目標

>

項目文件夾結構

索引

然後在根部創建一個Project_test文件夾，我們將在其中存儲python文件進行測試

本文中使用的所有代碼都在此。

代碼庫Explorer提高了調試效率，提供了分析依賴性和更快識別問題的工具。 雙子座在與傳統靜態分析結合使用時可以顯著增強代碼理解。

>本文所示的媒體不歸Analytics Vidhya擁有，並由作者的酌情決定使用。

相關文章

代碼庫Explorer提高了調試效率，提供了分析依賴性和更快識別問題的工具。雙子座在與傳統靜態分析結合使用時可以顯著增強代碼理解。