首頁 >後端開發 >Python教學 >IRIS-RAG-Gen:由 IRIS 向量搜尋提供支援的個人化 ChatGPT RAG 應用程式

IRIS-RAG-Gen:由 IRIS 向量搜尋提供支援的個人化 ChatGPT RAG 應用程式

Patricia Arquette
Patricia Arquette原創
2025-01-03 16:56:39235瀏覽

IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search

社群大家好,

在本文中,我將介紹我的應用程式 iris-RAG-Gen 。

Iris-RAG-Gen 是一款生成式 AI 檢索增強生成 (RAG) 應用程序,它利用 IRIS 向量搜尋的功能,在 Streamlit Web 框架、LangChain 和 OpenAI 的幫助下個性化 ChatGPT。該應用程式使用 IRIS 作為向量存儲。
IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search

應用功能

  • 將文件(PDF 或 TXT)提取到 IRIS
  • 與選定的攝取文件聊天
  • 刪除攝取的文件
  • OpenAI ChatGPT

將文件(PDF 或 TXT)提取到 IRIS

請依照下列步驟擷取文件:

  • 輸入 OpenAI 金鑰
  • 選擇文件(PDF 或 TXT)
  • 輸入文件說明
  • 點選「攝取文件」按鈕

IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search
 

攝取文件功能將文件詳細資料插入 rag_documents 表中,並建立​​「rag_document id」(rag_documents 的 ID)表來保存向量資料。

IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search

下面的 Python 程式碼會將所選文件儲存到向量中:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain_iris import IRISVector
from langchain_openai import OpenAIEmbeddings
from sqlalchemy import create_engine,text

<span>class RagOpr:</span>
    #Ingest document. Parametres contains file path, description and file type  
    <span>def ingestDoc(self,filePath,fileDesc,fileType):</span>
        embeddings = OpenAIEmbeddings() 
        #Load the document based on the file type
        if fileType == "text/plain":
            loader = TextLoader(filePath)       
        elif fileType == "application/pdf":
            loader = PyPDFLoader(filePath)       
        
        #load data into documents
        documents = loader.load()        
        
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=0)
        #Split text into chunks
        texts = text_splitter.split_documents(documents)
        
        #Get collection Name from rag_doucments table. 
        COLLECTION_NAME = self.get_collection_name(fileDesc,fileType)
               
        # function to create collection_name table and store vector data in it.
        db = IRISVector.from_documents(
            embedding=embeddings,
            documents=texts,
            collection_name = COLLECTION_NAME,
            connection_string=self.CONNECTION_STRING,
        )

    #Get collection name
    <span>def get_collection_name(self,fileDesc,fileType):</span>
        # check if rag_documents table exists, if not then create it 
        with self.engine.connect() as conn:
            with conn.begin():     
                sql = text("""
                    SELECT *
                    FROM INFORMATION_SCHEMA.TABLES
                    WHERE TABLE_SCHEMA = 'SQLUser'
                    AND TABLE_NAME = 'rag_documents';
                    """)
                result = []
                try:
                    result = conn.execute(sql).fetchall()
                except Exception as err:
                    print("An exception occurred:", err)               
                    return ''
                #if table is not created, then create rag_documents table first
                if len(result) == 0:
                    sql = text("""
                        CREATE TABLE rag_documents (
                        description VARCHAR(255),
                        docType VARCHAR(50) )
                        """)
                    try:    
                        result = conn.execute(sql) 
                    except Exception as err:
                        print("An exception occurred:", err)                
                        return ''
        #Insert description value 
        with self.engine.connect() as conn:
            with conn.begin():     
                sql = text("""
                    INSERT INTO rag_documents 
                    (description,docType) 
                    VALUES (:desc,:ftype)
                    """)
                try:    
                    result = conn.execute(sql, {'desc':fileDesc,'ftype':fileType})
                except Exception as err:
                    print("An exception occurred:", err)                
                    return ''
                #select ID of last inserted record
                sql = text("""
                    SELECT LAST_IDENTITY()
                """)
                try:
                    result = conn.execute(sql).fetchall()
                except Exception as err:
                    print("An exception occurred:", err)
                    return ''
        return "rag_document"+str(result[0][0])

 

在管理入口網站中輸入以下 SQL 指令來擷取向量資料

SELECT top 5
id, embedding, document, metadata
FROM SQLUser.rag_document2

IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search

 

與選定的攝取文件聊天

從選擇聊天選項部分選擇文件並輸入問題。 應用程式將讀取向量資料並傳回相關答案
IRIS-RAG-Gen: Personalizing ChatGPT RAG Application Powered by IRIS Vector Search

下面的 Python 程式碼會將所選文件儲存到向量中:

from langchain_iris import IRISVector
from langchain_openai import OpenAIEmbeddings,ChatOpenAI
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationSummaryMemory
from langchain.chat_models import ChatOpenAI


<span>class RagOpr:</span>
    <span>def ragSearch(self,prompt,id):</span>
        #Concat document id with rag_doucment to get the collection name
        COLLECTION_NAME = "rag_document"+str(id)
        embeddings = OpenAIEmbeddings() 
        #Get vector store reference
        db2 = IRISVector (
            embedding_function=embeddings,    
            collection_name=COLLECTION_NAME,
            connection_string=self.CONNECTION_STRING,
        )
        #Similarity search
        docs_with_score = db2.similarity_search_with_score(prompt)
        #Prepair the retrieved documents to pass to LLM
        relevant_docs = ["".join(str(doc.page_content)) + " " for doc, _ in docs_with_score]
        #init LLM
        llm = ChatOpenAI(
            temperature=0,    
            model_name="gpt-3.5-turbo"
        )
        #manage and handle LangChain multi-turn conversations
        conversation_sum = ConversationChain(
            llm=llm,
            memory= ConversationSummaryMemory(llm=llm),
            verbose=False
        )
        #Create prompt
        template = f"""
        Prompt: <span>{prompt}
        Relevant Docuemnts: {relevant_docs}
        """</span>
        #Return the answer
        resp = conversation_sum(template)
        return resp['response']

    


更多詳情,請造訪iris-RAG-Gen開啟交換申請頁。

謝謝

以上是IRIS-RAG-Gen:由 IRIS 向量搜尋提供支援的個人化 ChatGPT RAG 應用程式的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn