首页 >科技周边 >人工智能 >为业务计划和企业家精神制定AI驱动的智能指南

为业务计划和企业家精神制定AI驱动的智能指南

王林原创: 2025-02-25 18:36:11147浏览

如果您不是中等成员，则可以在此链接上阅读完整的故事。

> 在启动Chatgpt和以下大语模型（LLMS）之后，其固有的幻觉局限性，知识截止日期以及无法提供组织或人的特定信息很快变得明显，并被视为主要缺点。为了解决这些问题，检索增强生成（RAG）方法很快就获得了吸引力，将外部数据整合到LLMS并指导其行为以回答给定知识基础的问题。

有趣的是，Facebook AI Research（现为Meta AI）的研究人员于2020年发表了第一篇关于RAG的论文，但直到Chatgpt的出现，其潜力才完全实现。从那以后，一直没有停止。引入了更高级和复杂的抹布框架，不仅提高了该技术的准确性，而且还使其能够处理多模式数据，从而扩大了其广泛应用程序的潜力。我在以下文章中详细介绍了该主题，特别讨论了上下文多模式抹布，多模式AI搜索业务应用程序以及信息提取和对接平台。

>将多模式数据集成到大语言模型

多模式AI搜索业务应用程序

AI驱动的信息提取和对接>
随着抹布技术的不断扩展和新兴数据访问要求的不断扩展，可以通过整合其他多样化的知识来源和工具来扩展静态知识基础问题的纯抹布的功能，从而回答了静态知识基础的问题。例如：
>多个数据库（例如，包含向量数据库和知识图的知识库）

实时Web搜索以访问最新信息

>外部API收集特定数据，例如股票市场趋势或公司特定工具（例如Slack Channels或Email帐户）的数据

比较和合并来自多个来源的信息。

”的想法，该想法可以根据查询选择最佳的行动。

>在本文中，我们将开发一个特定的代理RAG应用程序，称为智能业务指南（SBG） -

>该工具的第一个版本是我们正在进行的项目的一部分乐观，由中央波罗的海Interreg资助。该项目的重点是使用AI的企业家和业务计划的芬兰和爱沙尼亚的高技能移民。 SBG是旨在在该项目的UPSKILLSing过程中使用的工具之一。该工具着重于提供从真实来源到打算开展业务或已经从事业务的人提供精确和快速的信息。>

SBG的代理抹布包括：

商业和企业家指南作为知识库，其中包含有关业务计划，企业家精神，公司注册，税收，商业思想，规则和法规，商机，许可证，许可证，商业准则等的信息。

>这个代理抹布有什么特殊之处？ 选择>不同的开源模型（

llama，mistral，gemma

**> **以及>专有模型s _（gpt -4O，GPT-4O-MIN_I）在整个代理工作流程中。开源型号不在本地运行，因此不需要强大，昂贵的计算机。取而代之的是，它们在groq cloud的platform上运行，并带有a> free ap i。是的，这使得它是 cost-fre ** e Admitic rag。 GPT型号也可以使用OpenAI的API键选择。> >实施知识基础搜索，Web搜索和混合搜索的选项。 >检索文档的评分以提高响应质量，并根据分级智能调用Web搜索。 选择响应类型的选项：concise
中等，
>解释性
具体来说，是围绕以下主题构成的：> 解析数据以使用Llamaparse

>使用langgraph开发代理工作流程。

>使用免费的开源模型开发高级代理抹布（以下称为智能业务指南或SBG）

>该应用程序的整个代码可以在github上找到。

文件中结构：_agenticrag.py.py>实现了整个代理工作流程，并且 app.py 简化

图形用户界面。

让我们深入研究。

>用llamaparsing和langchain

构建知识基础 SBG的知识基础包括芬兰机构发表的真实业务和企业家指南。由于这些指南是庞大的，并且从中找到所需的信息并不是微不足道的，因此目的是开发一个代理抹布，不仅可以从这些指南中提供精确的信息，而且还可以通过网络搜索和其他可信赖的来源来增强它们。芬兰以获取最新信息。

Llamaparse是一个由LLM和LLM用例构建的Genai-native文档解析平台。我已经解释了在上面引用的文章中使用Llamaparse的使用。这次，我直接在Llamacloud解析了文件。 Llamaparse每天提供1000个免费积分。这些学分的使用取决于解析模式。对于仅文本的PDF，‘

fast

'模式（1个学分 / 3页）效果很好，可以跳过OCR，图像提取和表格 /标识。还有其他更高级的模式可用，每个页面的信用点数量更高。我选择了执行OCR，图像提取和表/标识的“premium”模式，非常适合具有图像的复杂文档。 我定义了以下解析指令。

解析的文件以llamacloud的速度格式下载。可以通过Llamacloud API进行相同的解析。

You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
Include the document name and page number at the start and end of each extracted page.

这是Pikkala，A。等，（2015）的《指南创造力和业务》中的示例页面（“

>免费复制以供非商业私人或公共使用，attribution

import os
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Define parsing instructions
parsing_instructions = """
Extract the text from the document using proper structure.
"""
def save_to_markdown(output_path, content):
    """
    Save extracted content to a markdown file.

    Parameters:
    output_path (str): The path where the markdown file will be saved.
    content (list): The extracted content to be saved.
    """
    with open(output_path, "w", encoding="utf-8") as md_file:
        for document in content:
            # Extract the text content from the Document object
            md_file.write(document.text + "nn")  # Access the 'text' attribute

def extract_document(input_path):
    # Initialize the LlamaParse parser
    parsing_instructions = """You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
    If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
    Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
    Include the document name and page number at the start and end of each extracted page.
    """
    parser = LlamaParse(
        result_type="markdown",
        parsing_instructions=parsing_instructions,
        premium_mode=True,
        api_key=LLAMA_CLOUD_API_KEY,
        verbose=True
    )

    file_extractor = {".pdf": parser}
    documents = SimpleDirectoryReader(
        input_path, file_extractor=file_extractor
    ).load_data()
    return documents

input_path = r"C:Usersh02317Downloadsdocs"  # Replace with your document path
output_file = r"C:Usersh02317Downloadsextracted_document.md"  # Output markdown file name

# Extract the document
extracted_content = extract_document(input_path)
save_to_markdown(output_file, extracted_content)

>”）。

这是此页面的解析输出。 Llamaparse从页面中的所有结构中有效提取信息。页面中显示的笔记本为图像格式。

然后使用langchain's

recursivecharactertextsplitter

[Creativity and Business, page 8]

# How to use this book

1. The book is divided into six chapters and sub-sections dealing with different topics. You can read the book through one chapter and topic at a time, or you can use the checklist of the table of contents to select sections on topics in which you need more information and support.

2. Each section opens with a creative entrepreneur's thought on the topic.

3. The introduction gives a brief description of the topic.

4. Each section contains exercises that help you reflect on your own skills and business idea and develop your business idea further.

## What is your business idea

"I would like to launch
a touring theatre company."

Do you have an idea about a product or service you would like
to sell? Or do you have a bunch of ideas you have been mull-
ing over for some time? This section will help you get a better
understanding about your business idea and what competen-
cies you already have that could help you implement it, and
what types of competencies you still need to gain.

### EXTRA
Business idea development
in a nutshell

I found a great definition of what business idea development
is from the My Coach online service (Youtube 27 May 2014).
It divides the idea development process into three stages:
the thinking - stage, the (subconscious) talking - stage, and the
customer feedback stage. It is important that you talk about
your business idea, as it is very easy to become stuck on a
particular path and ignore everything else. You can bounce
your idea around with all sorts of people: with a local business
advisor; an experienced entrepreneur; or a friend. As you talk
about your business idea with others, your subconscious will
start working on the idea, and the feedback from others will
help steer the idea in the right direction.

### Recommended reading
Taivas + helvetti
(Terho Puustinen &amp; Mika Mäkeläinen:
One on One Publishing Oy 2013)

### Keywords
treasure map; business idea; business idea development

## EXERCISE: Identifying your personal competencies

Write down the various things you have done in your life and think what kind of competencies each of these things has
given you. The idea is not just to write down your education,
training and work experience like in a CV; you should also
include hobbies, encounters with different types of people, and any life experiences that may have contributed to you
being here now with your business idea. The starting circle can be you at any age, from birth to adulthood, depending
on what types of experiences you have had time to accumulate. The final circle can be you at this moment.

PERSONAL CAREER PATH

SUPPLEMENTARY
PERSONAL DEVELOPMENT
(e.g. training courses;
literature; seminars)

Fill in the
"My Competencies"
section of the
Creative Business
Model Canvas:

5. Each section also includes an EXTRA box with interesting tidbits about the topic at hand.

6. For each topic, tips on further reading are given in the grey box.

7. The second grey box contains recommended keywords for searching more information about the topic online.

8. By completing each section of the one-page business plan or "Creative Business Model Canvas" (page 74),
by the end of the book you will have a complete business plan.

9. By writing down your business start-up costs (e.g. marketing or logistics) in the price tag box of each section,
by the time you get to the Finance and Administration section you will already know your start-up costs
and you can enter them in the receipt provided in the Finance and Administration section (page 57).

This book is based on Finnish practices. The authors and the publisher are not responsible for the applicability of factual information to other
countries. Readers are advised to check country-specific information on business structures, support organisations, taxation, legislation, etc.
Factual information about Finnish practices should also be checked in case of differing interpretations by authorities.

[Creativity and Business, page 8]

>然后将分解的降价文档分为块，chunk_size = 3000 = 3000和chunk_overlap = 200.

。随后，使用嵌入式模型（例如Open-Source

> ALL-MINILM-L6-V2

def staticChunker(folder_path):
    docs = []
    print(f"Creating chunks. CHUNK_SIZE: {CHUNK_SIZE}, CHUNK_OVERLAP: {CHUNK_OVERLAP}")

    # Loop through all .md files in the folder
    for file_name in os.listdir(folder_path):
        if file_name.endswith(".md"):
            file_path = os.path.join(folder_path, file_name)
            print(f"Processing file: {file_path}")
            # Load documents from the Markdown file
            loader = UnstructuredMarkdownLoader(file_path)
            documents = loader.load()
            # Add file-specific metadata (optional)
            for doc in documents:
                doc.metadata["source_file"] = file_name
            # Split loaded documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
            chunked_docs = text_splitter.split_documents(documents)
            docs.extend(chunked_docs)
    return docs

模型）或OpenAI's

text-embedding-3-large > 。 创建代理工作流 AI代理是工作流程和决策逻辑的组合，以智能回答问题或执行需要分解为更简单的子任务的其他复杂任务。

>我使用langgraph为我们的AI代理设计一个工作流程，用于以图的形式为动作或决策的顺序设计。我们的代理商必须决定是从矢量数据库（知识库），Web搜索，混合搜索还是使用工具中回答问题。

def load_or_create_vs(persist_directory):
    # Check if the vector store directory exists
    if os.path.exists(persist_directory):
        print("Loading existing vector store...")
        # Load the existing vector store
        vectorstore = Chroma(
            persist_directory=persist_directory,
            embedding_function=st.session_state.embed_model,
            collection_name=collection_name
        )
    else:
        print("Vector store not found. Creating a new one...n")
        docs = staticChunker(DATA_FOLDER)
        print("Computing embeddings...")
        # Create and persist a new Chroma vector store
        vectorstore = Chroma.from_documents(
            documents=docs,
            embedding=st.session_state.embed_model,
            persist_directory=persist_directory,
            collection_name=collection_name
        )
        print('Vector store created and persisted successfully!')

    return vectorstore

在下一篇文章中，我解释了使用langgraph创建代理工作流的过程。

>如何使用自动Internet搜索开发免费的AI代理 我们需要创建图形
nodes

，该节点代表做出决策的工作流程（例如，Web搜索或Vector数据库搜索）。节点通过

> edges连接，该节点定义了决策和动作的流动（例如，检索后的下一个状态是什么）。图形state在通过图移动时跟踪信息，以便代理使用每个步骤的正确数据。工作流程中的输入点是一个路由器函数，它通过分析用户的查询来确定在工作流中执行的初始节点。整个工作流都包含以下节点。

检索

_ grange_documents _：根据用户查询将检索到的块的相关性。 _

route_after_grading _：基于分级，确定是使用已检索的文档构成响应还是继续进行Web搜索。

websearch

：使用塔维利搜索引擎的API从Web来源获取信息。

>生成

：使用提供的上下文对用户查询生成响应（从向量存储和/或Web搜索检索的信息）。

_ get_contact_tool _：从与芬兰移民服务相关的预定义的可信URL中获取联系信息。 _ get_tax_info

_：从预定义的受信任的URL中获取与税务相关的信息。

_ get_registration_info_：从芬兰的公司注册过程中获取芬兰的详细信息

get_licensing_info _：获取有关在芬兰创业所需的许可和许可的信息。>

hybrid_search _：结合文档检索和互联网搜索结果，提供了更广泛的背景来回答查询。>

>不相关

：处理与工作流的焦点无关的问题

这是工作流程中的边缘。

_ 检索→grade_documents _：已将检索的文档发送用于分级。
_ 等级_documents→WebSearch _：如果发现已检索的文档无关紧要。
grade_documents→生成_：如果检索到的文档相关，则进行响应生成。>
> websearch→生成：传递Web搜索响应生成的结果。
_get_contact_tool，get_taxinfo ，_get_registration info，_get_licensinginfo→info→generate生成节点传递从特定受信任来源的响应生成的特定信息。 _hybrid搜索
生成 ：通过响应生成的组合结果（vectorstore websearch）。> >无关
生成 ：为无关问题提供后备回答。 >图状态结构充当维护工作流程状态的容器，并包括以下元素：

问题

generation ：对用户查询的最终生成响应，该响应是在处理后填充的。
web_search_needed _：一个标志，指示是否基于检索的文档的相关性需要Web搜索。> >文档
：与查询相关的已检索或处理过的文档列表。> _
_：指定答案的所需样式，例如“简洁”，“中等”或“解释性”。 图形结构定义如下： 路由器函数之后的

与工作流无关的问题路由到_handle不相关的

node，该节点通过 node提供了后备响应。

在下图中描绘了整个工作流程。为业务计划和企业家精神制定AI驱动的智能指南

>检索和分级

节点调用了回猎人的问题，该问题是从矢量存储中获取相关信息块的问题。这些块（“文档”）发送到_gradedocuments node的节点以对其相关性进行评分。基于分级块（“ _filtered doc”），_ROUTE_AFTER分级node node决定是否使用检索到的信息或调用Web搜索来生成。助手函数_initialize_grader链用及时引导Grader llm初始化级别链，以评估每个块的相关性。 _grade文档节点分析每个块，以确定它是否与问题相关。对于每个块，它都会输出“是”或“no”，具体取决于块是否与问题相关。 Web和Hybrid Search

You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
Include the document name and page number at the start and end of each extracted page.

> _web

搜索

节点是由_ROUTE_AFTER

在检索信息中找不到相关块的情况，或者直接通过_route _internet_search启用状态标志是“true ”（由无线电选择用户界面中的按钮）或路由器函数决定将查询路由到_websearch以获取最新和更多相关的信息。> 可以通过在他们的网站上创建帐户来获得>文档”，然后将其传递给使用状态变量的生成node”问题

。

>混合搜索结合了retriever和tavily搜索的结果，并填充了“> document”的状态变量，该变量将传递给使用“Question ”状态变量。调用工具

此代理工作流中使用的工具是从预定义的受信任URL获取信息的报废函数。塔维尔（Tavily）和这些工具之间的区别在于，塔维利（Tavily）进行了更广泛的互联网搜索，以带来不同来源的结果。鉴于，这些工具使用Python美丽的汤网报废库来从受信任的来源（预定义的URL）中提取信息。这样，我们确保从已知的，可信赖的来源中提取有关某些查询的信息。此外，此信息检索是完全免费的。> 这是_get_taxinfo node如何与某些辅助功能一起使用。这种类型的其他工具（节点）也以相同的方式起作用。>

You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
Include the document name and page number at the start and end of each extracted page.

生成响应

节点，生成，通过使用下面描述的预定义提示（langchain's提示> class）调用链条来创建最终响应。 _rag提示接收状态变量_ “响应生成的行为，包括有关响应风格，对话语调，格式指南，引用规则，混合上下文处理和仅上下文重点的说明。 生成节点首先检索状态变量“

问题

import os
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Define parsing instructions
parsing_instructions = """
Extract the text from the document using proper structure.
"""
def save_to_markdown(output_path, content):
    """
    Save extracted content to a markdown file.

    Parameters:
    output_path (str): The path where the markdown file will be saved.
    content (list): The extracted content to be saved.
    """
    with open(output_path, "w", encoding="utf-8") as md_file:
        for document in content:
            # Extract the text content from the Document object
            md_file.write(document.text + "nn")  # Access the 'text' attribute

def extract_document(input_path):
    # Initialize the LlamaParse parser
    parsing_instructions = """You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
    If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
    Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
    Include the document name and page number at the start and end of each extracted page.
    """
    parser = LlamaParse(
        result_type="markdown",
        parsing_instructions=parsing_instructions,
        premium_mode=True,
        api_key=LLAMA_CLOUD_API_KEY,
        verbose=True
    )

    file_extractor = {".pdf": parser}
    documents = SimpleDirectoryReader(
        input_path, file_extractor=file_extractor
    ).load_data()
    return documents

input_path = r"C:Usersh02317Downloadsdocs"  # Replace with your document path
output_file = r"C:Usersh02317Downloadsextracted_document.md"  # Output markdown file name

# Extract the document
extracted_content = extract_document(input_path)
save_to_markdown(output_file, extracted_content)

”，“”

> documents> ”和“ _answerstyle”和格式单个字符串作为上下文。随后，它使用_rag提示调用生成链，并且响应生成llm _ 生成“ generatio_n”状态变量的最终答案。 _app.p_y使用此状态变量，以在> spartlit 用户界面中显示生成的响应。 >使用GROQ的免费API，有可能达到模型的速率或上下文窗口限制。在那种情况下，我将生成的节点扩展到以圆形方式从模型名称列表中动态切换模型，然后在生成响应后将模型恢复到当前模型。助手功能

rag.py 中还有其他帮助功能，用于初始化应用程序，llms，嵌入模型和会话变量。函数_initialize

app

[Creativity and Business, page 8]

# How to use this book

2. Each section opens with a creative entrepreneur's thought on the topic.

3. The introduction gives a brief description of the topic.

4. Each section contains exercises that help you reflect on your own skills and business idea and develop your business idea further.

## What is your business idea

"I would like to launch
a touring theatre company."

### EXTRA
Business idea development
in a nutshell

### Recommended reading
Taivas + helvetti
(Terho Puustinen &amp; Mika Mäkeläinen:
One on One Publishing Oy 2013)

### Keywords
treasure map; business idea; business idea development

## EXERCISE: Identifying your personal competencies

PERSONAL CAREER PATH

SUPPLEMENTARY
PERSONAL DEVELOPMENT
(e.g. training courses;
literature; seminars)

Fill in the
"My Competencies"
section of the
Creative Business
Model Canvas:

5. Each section also includes an EXTRA box with interesting tidbits about the topic at hand.

6. For each topic, tips on further reading are given in the grey box.

7. The second grey box contains recommended keywords for searching more information about the topic online.

8. By completing each section of the one-page business plan or "Creative Business Model Canvas" (page 74),
by the end of the book you will have a complete business plan.

[Creativity and Business, page 8]

在应用程序初始化期间从

> app.py

调用，并且每当每次通过

spreatlitapp更改模型或状态变量时，都会触发__。它重新定位组件并保存更新的状态。此功能还可以跟踪各种会话变量并防止冗余初始化。 以下助手功能初始化了答案的LLM，嵌入模型，路由器LLM和分级LLM。模型名称的列表_model列表，用于跟踪模型在模型的动态切换过程中的跟踪>生成> node。

建立工作流

def staticChunker(folder_path):
    docs = []
    print(f"Creating chunks. CHUNK_SIZE: {CHUNK_SIZE}, CHUNK_OVERLAP: {CHUNK_OVERLAP}")

    # Loop through all .md files in the folder
    for file_name in os.listdir(folder_path):
        if file_name.endswith(".md"):
            file_path = os.path.join(folder_path, file_name)
            print(f"Processing file: {file_path}")
            # Load documents from the Markdown file
            loader = UnstructuredMarkdownLoader(file_path)
            documents = loader.load()
            # Add file-specific metadata (optional)
            for doc in documents:
                doc.metadata["source_file"] = file_name
            # Split loaded documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
            chunked_docs = text_splitter.split_documents(documents)
            docs.extend(chunked_docs)
    return docs

现在，使用_route问题的图形状态，节点，条件输入点，并且边缘被定义为建立节点之间的流程。最后，将工作流汇编为可执行的app，以供在

> spartlit

def load_or_create_vs(persist_directory):
    # Check if the vector store directory exists
    if os.path.exists(persist_directory):
        print("Loading existing vector store...")
        # Load the existing vector store
        vectorstore = Chroma(
            persist_directory=persist_directory,
            embedding_function=st.session_state.embed_model,
            collection_name=collection_name
        )
    else:
        print("Vector store not found. Creating a new one...n")
        docs = staticChunker(DATA_FOLDER)
        print("Computing embeddings...")
        # Create and persist a new Chroma vector store
        vectorstore = Chroma.from_documents(
            documents=docs,
            embedding=st.session_state.embed_model,
            persist_directory=persist_directory,
            collection_name=collection_name
        )
        print('Vector store created and persisted successfully!')

    return vectorstore

接口中使用。工作流程中的条件入口点使用_route

问题

函数来根据查询选择工作流中的第一个节点。条件边缘（_workflow.add_conditional

edges）描述是否要过渡到 websearch>或生成生成node> node> node> node node基于_grade 确定的块的相关性Documentsnode

You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
Include the document name and page number at the start and end of each extracted page.

流lit接口

> app.py 中的简化应用程序提供了一个交互式接口，可以使用动态设置来提出问题和显示响应，以进行模型选择，答案样式和特定于查询的工具。 _initializeapp 函数，从_agenticrag.py导入，初始化所有会话变量，包括所有LLMS，嵌入模型以及从左侧栏中选择的其他选项。 _agentic_rag.p_y中的打印语句通过将

sys.stdout

重定向到io.stringiobuffer来捕获。然后，使用_text区域组件在shatlit。这是简化接口的快照：>

import os
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Define parsing instructions
parsing_instructions = """
Extract the text from the document using proper structure.
"""
def save_to_markdown(output_path, content):
    """
    Save extracted content to a markdown file.

    Parameters:
    output_path (str): The path where the markdown file will be saved.
    content (list): The extracted content to be saved.
    """
    with open(output_path, "w", encoding="utf-8") as md_file:
        for document in content:
            # Extract the text content from the Document object
            md_file.write(document.text + "nn")  # Access the 'text' attribute

def extract_document(input_path):
    # Initialize the LlamaParse parser
    parsing_instructions = """You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
    If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
    Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
    Include the document name and page number at the start and end of each extracted page.
    """
    parser = LlamaParse(
        result_type="markdown",
        parsing_instructions=parsing_instructions,
        premium_mode=True,
        api_key=LLAMA_CLOUD_API_KEY,
        verbose=True
    )

    file_extractor = {".pdf": parser}
    documents = SimpleDirectoryReader(
        input_path, file_extractor=file_extractor
    ).load_data()
    return documents

input_path = r"C:Usersh02317Downloadsdocs"  # Replace with your document path
output_file = r"C:Usersh02317Downloadsextracted_document.md"  # Output markdown file name

# Extract the document
extracted_content = extract_document(input_path)
save_to_markdown(output_file, extracted_content)

以下图像显示了由'

contise'> contise'

选择的答案，以选择答案样式。查询路由器（_ROUTE 为业务计划和企业家精神制定AI驱动的智能指南