首页 >科技周边 >人工智能 >使用Google＆＃039; s gemini-2.0构建代码库资源管理器

使用Google＆＃039; s gemini-2.0构建代码库资源管理器

William Shakespeare原创: 2025-03-08 11:30:15887浏览

您是否曾经发现很难理解一个杂乱的代码库？还是想知道分析和探索代码的工具实际上是如何工作的？在本文中，我们将通过从头开始构建强大的代码库探索工具来解决这些问题。使用静态代码分析和Gemini模型，我们将创建一个易于使用的系统，可帮助开发人员从其代码中查询，理解和获得有用的见解。准备更改您导航代码的方式了吗？让我们开始！

学习目标

使用面向对象的编程的复杂软件开发。
>如何使用AST或抽象语法树解析和分析Python代码库。
了解如何将Google的Google的Gemini LLM API与Python应用代码分析相结合。
> typer命令行的基于CodeBase探索的查询系统。>

>本文是> > data Science Blogathon的一部分。目录的目录

>

架构概述概述
启动动手project project
设置项目环境
>测试应用程序
未来开发
结论
>常见问题
>
体系结构概述

该工具由四个主要组件

组成

代码解析器：它是我们系统的基础，它负责分析Python文件并使用Python的抽象语法树（AST）模块提取其结构。它标识类，方法，功能和导入。它将创建代码库的综合图。
>
Gemini客户端：围绕Google的双子座API包装器，可处理与LLM模型的通信。这些组件管理API身份验证，并提供了一个干净的接口，用于发送查询和接收响应。

QUERY处理器：是该工具的主要引擎，负责以Gemini可以有效理解和处理的方式格式化代码库上下文和查询。它保持代码库结构的持续索引，并管理解析器与LLM之间的相互作用。
CLI接口：
>用typer构建的用户友好命令行接口，为索引代码库，查询代码结构和分析堆栈痕迹提供命令。>>>>>> >开始动手项目

>本节将指导您完成构建和实施项目的初始步骤，以确保启动和有效的学习经验。

项目文件夹结构

项目文件夹结构将类似于这些

设置项目环境

在以下步骤中

设置项目环境：

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

安装所有必要的库：

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

实施代码

我们将从理解和实施代码库解析系统开始。它具有两个重要的功能

pip install google-generativeai google-ai-generativelanguage
pip install python-dotenv typer llama-index

> parse_codebase（）

extract_definitions（）

这是PARSE_CODEBASE（）的辅助函数。它将采用Python文件的抽象语法树（AST）。该函数启动了一个词典，其中包含用于类，功能和导入的空列表。现在，Ast.Walk（）通过AST树中的所有节点迭代。 AST模块将标识所有类，功能，导入和线数。然后将所有定义附加到定义字典中。

>解析代码库

import ast
import os
from typing import Dict, Any

def extract_definitions(tree: ast.AST) -> Dict[str, list]:
    """Extract class and function definitions from AST."""
    definitions = {
        "classes": [],
        "functions": [],
        "imports": []
    }
    
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            definitions["classes"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.FunctionDef):
            definitions["functions"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.Import):
            for name in node.names:
                definitions["imports"].append(name.name)
    return definitions

此功能扫描python文件的目录，读取其内容并提取其结构。

函数以目录路径为字符串启动。它输出了代码结构的字典。该字典存储每个Python文件的提取数据。

>它通过所有子目录和给定目录中的文件循环。

import ast
import os
from typing import Dict, Any

def parse_codebase(directory: str) -> Dict[str, Any]:
    """Parse Python files in the directory and extract code structure."""
    code_structure = {}
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".py"):
                file_path = os.path.join(root, file)
                with open(file_path, "r", encoding="utf-8") as f:
                    try:
                        content = f.read()
                        tree = ast.parse(content)
                        code_structure[file_path] = {
                            "definitions": extract_definitions(tree),
                            "content": content
                        }
                    except Exception as e:
                        print(f"Error parsing {file_path}: {e}")
    return code_structure

> os.walk（）

提供了一种浏览整个目录树的递归方法。它将处理结束.py扩展的文件。

使用Pythonast模块将文件的内容解析到代表文件结构的抽象语法树（AST）中。然后将提取的树传递到extract_definitions（tree）。如果解析失败，它将打印一条错误消息，但继续处理其他文件。>

>查询处理引擎

在查询引擎目录中创建两个名为gemini_client.py和query_processor.py

的文件

gemini客户端

此文件将使用>＆lt>＆lt; >从Google验证Gemini Model API。在项目的根部中，创建一个.env文件，然后将您的Gemini API键放入其中。获取您的api_keyhere。在这里，我们定义了A

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

geminiclient

类以与Google的Gemini AI模型进行交互。它将使用.ENV文件使用 google_api_key 来验证模型。配置模型API后，它提供了一种查询方法，可以在给定的提示符上生成响应。 >查询处理系统

在本节中，我们将实现查询过程类以管理代码库上下文并使用Gemini启用查询。

加载必要的库后，

load_dotenv

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

（）加载来自。 gemini api api api api

键。 > GEMINIEMBEDING类从Google服务器初始化嵌入式-001型号。> QueryProcessor类旨在处理代码库上下文并与geminiclient.loading_contextmethod进行交互。 > thesaving_contextmethod将当前的代码库上下文保存到JSON文件中以供persistence.save_contextmethod更新代码库上下文，并立即将其保存为usingsave_context和theefformat_contextmetext和theeformat_contextMethod，将代码库数据转换为可读取人类的字符串形式，以征求人类的字符串for for for for Human-munther-formaT for gromand formaties forman-fime >查询双子座是最重要的方法，它将使用代码库上下文和用户的查询构建提示。它通过GeminicLient将此提示发送到Gemini模型并恢复响应。 >命令行应用程序实现（CLI）

>
>
>步骤2：初始化typer和Query处理器
>让我们创建一个从类中创建一个 typer

>步骤3：索引Python项目目录

>在这里，

索引

方法将用作终端中的命令，该函数将在指定的目录中索引Python代码库进行将来的查询和分析。

|--codebase_explorer/
|src/
├──| __init__.py
├──| indexer/
│   ├── __init__.py
│   └── code_parser.py
├──| query_engine/
│   ├── __init__.py
│   ├── query_processor.py
│   └── gemini_client.py
|
├── main.py
└── .env

>它将首先检查目录是否存在，然后使用parse_codebase 函数在目录中提取Python文件的结构。

解析后，它将保存在的情况下保存解析的代码库结构。除了块外，所有过程均在尝试中，因此可以在解析过程中谨慎处理例外。它将准备使用Gemini模型的代码库进行高效查询。 >步骤4：查询代码库 索引后，我们可以查询代码库以了解或获取有关代码库中任何功能的信息。

首先，检查是否已加载了代码库上下文，并尝试从计算机硬盘加载上下文。然后使用

query_processor的

#create a new conda env
conda create -n cb_explorer python=3.11
conda activate cb_explorer

查询方法来处理查询。

和最后一个，它将使用typer.echo（）方法。 步骤5：运行应用程序

测试应用程序

测试您的辛勤工作，请按照以下步骤进行操作：

>在您的项目root中创建一个文件夹名称索引，我们将在其中放置所有索引文件。

pip install google-generativeai google-ai-generativelanguage
pip install python-dotenv typer llama-index

>创建一个codebase_index.json并将其放入以前的（索引）创建的文件夹中。

然后在根部创建一个Project_test文件夹，我们将在其中存储python文件进行测试

>在project_test文件夹中创建一个find_palidrome.py文件，然后将以下代码放在文件中。

>代码实现
>
>

索引项目

import ast
import os
from typing import Dict, Any

def extract_definitions(tree: ast.AST) -> Dict[str, list]:
    """Extract class and function definitions from AST."""
    definitions = {
        "classes": [],
        "functions": [],
        "imports": []
    }
    
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            definitions["classes"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.FunctionDef):
            definitions["functions"].append({
                "name": node.name,
                "lineno": node.lineno
            })
        elif isinstance(node, ast.Import):
            for name in node.names:
                definitions["imports"].append(name.name)
    return definitions

输出：

>您可以显示成功索引1 python文件。而且JSON数据看起来像

import ast
import os
from typing import Dict, Any

def parse_codebase(directory: str) -> Dict[str, Any]:
    """Parse Python files in the directory and extract code structure."""
    code_structure = {}
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".py"):
                file_path = os.path.join(root, file)
                with open(file_path, "r", encoding="utf-8") as f:
                    try:
                        content = f.read()
                        tree = ast.parse(content)
                        code_structure[file_path] = {
                            "definitions": extract_definitions(tree),
                            "content": content
                        }
                    except Exception as e:
                        print(f"Error parsing {file_path}: {e}")
    return code_structure

>查询项目

使用Google＆＃039; s gemini-2.0构建代码库资源管理器

>输出：

import os
from typing import Optional
from google import generativeai as genai
from dotenv import load_dotenv

load_dotenv()


class GeminiClient:
    def __init__(self):
        self.api_key = os.getenv("GOOGLE_API_KEY")
        if not self.api_key:
            raise ValueError("GOOGLE_API_KEY environment variable is not set")

        genai.configure(api_key=self.api_key)
        self.model = genai.GenerativeModel("gemini-1.5-flash")

    def query(self, prompt: str) -> Optional[str]:
        """Query Gemini with the given prompt."""
        try:
            response = self.model.generate_content(prompt)
            return response.text
        except Exception as e:
            print(f"Error querying Gemini: {e}")
            return None

输出：

import os
import json
from llama_index.embeddings.gemini import GeminiEmbedding


from dotenv import load_dotenv
from typing import Dict, Any, Optional
from .gemini_client import GeminiClient

load_dotenv()

gemini_api_key = os.getenv("GOOGLE_API_KEY")
model_name = "models/embeddings-001"
embed_model = GeminiEmbedding(model_name=model_name, api_key=gemini_api_key)


class QueryProcessor:
    def __init__(self):
        self.gemini_client = GeminiClient()
        self.codebase_context: Optional[Dict[str, Any]] = None
        self.index_file = "./indexes/codebase_index.json"

    def load_context(self):
        """Load the codebase context from disk if it exists."""
        if os.path.exists(self.index_file):
            try:
                with open(self.index_file, "r", encoding="utf-8") as f:
                    self.codebase_context = json.load(f)
            except Exception as e:
                print(f"Error loading index: {e}")
                self.codebase_context = None

    def save_context(self):
        """Save the codebase context to disk."""
        if self.codebase_context:
            try:
                with open(self.index_file, "w", encoding="utf-8") as f:
                    json.dump(self.codebase_context, f, indent=2)
            except Exception as e:
                print(f"Error saving index: {e}")

    def set_context(self, context: Dict[str, Any]):
        """Set the codebase context for queries."""
        self.codebase_context = context
        self.save_context()

    def format_context(self) -> str:
        """Format the codebase context for Gemini."""
        if not self.codebase_context:
            return ""

        context_parts = []
        for file_path, details in self.codebase_context.items():
            defs = details["definitions"]
            context_parts.append(
                f"File: {file_path}\n"
                f"Classes: {[c['name'] for c in defs['classes']]}\n"
                f"Functions: {[f['name'] for f in defs['functions']]}\n"
                f"Imports: {defs['imports']}\n"
            )
        return "\n\n".join(context_parts)

    def query(self, query: str) -> Optional[str]:
        """Process a query about the codebase."""
        if not self.codebase_context:
            return (
                "Error: No codebase context available. Please index the codebase first."
            )

        prompt = f"""
        Given the following codebase structure:
        {self.format_context()}
        
        Query: {query}
        
        Please provide a detailed and accurate answer based on the codebase structure above.
        """
        return self.gemini_client.query(prompt)

使用Google＆＃039; s gemini-2.0构建代码库资源管理器

如果一切都正确完成，您将在终端中获取这些输出。您可以使用Python代码文件尝试一下，并在评论部分告诉我您的输出是什么。谢谢你和我在一起。

import os
import json
import typer
from pathlib import Path
from typing import Optional
from indexer.code_parser import parse_codebase
from query_engine.query_processor import QueryProcessor

未来的发展这是基础系统的原型，可以通过许多有趣的功能进行扩展，例如

您可以与IDE插件集成以进行无缝代码探索。
>
实时代码分析和LLM的改进建议。
>

结论

>代码库资源管理器可帮助您了解AI在软件开发工具中的实际应用。通过将传统的静态分析与现代AI功能相结合，我们创建了一种工具，使代码库探索更加直观和高效。这种方法显示了AI如何在不替换现有工具的情况下增加开发人员的工作流程，从而为复杂代码库提供了新的理解和可访问性。

本文中使用的所有代码都在此。

钥匙要点

结构代码解析是代码分析的最重要的技术。>

代码库资源管理器简化了代码导航，允许开发人员快速理解和管理复杂的代码结构。

代码库Explorer提高了调试效率，提供了分析依赖性和更快识别问题的工具。双子座在与传统静态分析结合使用时可以显着增强代码理解。

CLI工具可以为LLM Assisted Code Exportoration提供强大的接口。>

常见问题
q
Q
2。该工具可以离线工作吗？代码解析和索引管理可以离线工作，但是使用Gemini API查询代码库需要Internet连接以与外部服务器进行通信。我们可以将Ollama与工具集成在一起，这些工具可以使用设备LLM或SLM模型（例如Llama3或Phi-3）来查询代码库。
q

>本文所示的媒体不归Analytics Vidhya拥有，并由作者的酌情决定使用。

以上是使用Google＆＃039; s gemini-2.0构建代码库资源管理器的详细内容。更多信息请关注PHP中文网其他相关文章！

声明：

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

上一篇：A Comprehensive Guide to Databricks Lakehouse AI For Data Scientists下一篇：MultiModal Agentic Framework to Create Real Estate Brochures

查看更多

使用Google＆＃039; s gemini-2.0构建代码库资源管理器

学习目标

>

项目文件夹结构

索引

然后在根部创建一个Project_test文件夹，我们将在其中存储python文件进行测试

本文中使用的所有代码都在此。

代码库Explorer提高了调试效率，提供了分析依赖性和更快识别问题的工具。 双子座在与传统静态分析结合使用时可以显着增强代码理解。

>本文所示的媒体不归Analytics Vidhya拥有，并由作者的酌情决定使用。

相关文章

代码库Explorer提高了调试效率，提供了分析依赖性和更快识别问题的工具。双子座在与传统静态分析结合使用时可以显着增强代码理解。