Home  >  Article  >  Technology peripherals  >  Improve engineering efficiency - enhanced search generation (RAG)

Improve engineering efficiency - enhanced search generation (RAG)

王林
王林forward
2023-10-14 20:17:011448browse

With the advent of large-scale language models such as GPT-3, major breakthroughs have been made in the field of natural language processing (NLP). These language models have the ability to generate human-like text and have been widely used in various scenarios such as chatbots and translation

Improve engineering efficiency - enhanced search generation (RAG)

However, when it comes to specialization and customization When used in application scenarios, general-purpose large language models may be insufficient in terms of professional knowledge. Fine-tuning these models with specialized corpora is often expensive and time-consuming. "Retrieval Enhanced Generation" (RAG) provides a new technology solution for professional applications.

Improve engineering efficiency - enhanced search generation (RAG)

Below we mainly introduce how RAG works, and use a practical example to use the product manual as a professional corpus and use GPT-3.5 Turbo as a question and answer model to verify its effectiveness sex.

Case: Develop a chatbot that can answer questions related to a specific product. The enterprise has a unique user manual

RAG INTRODUCTION

RAG provides an effective solution for domain-specific questions and answers. It mainly converts industry knowledge into vectors for storage and retrieval, combines the retrieval results with user questions to form prompt information, and finally uses large models to generate appropriate answers. By combining the retrieval mechanism and language model, the responsiveness of the model is greatly enhanced

The steps to create a chatbot program are as follows:

  1. Read the PDF (user manual PDF file) and use chunk_size Tokenize 1000 tokens.
  2. Create vectors (you can use OpenAI EmbeddingsAPI to create vectors).
  3. Store vectors in the local vector library. We will use ChromaDB as the vector database (the vector database can also be replaced by Pinecone or other products).
  4. User issues prompt with query/question.
  5. Retrieve knowledge context data from the vector database based on the user's questions. This knowledge context data will be used in conjunction with the cue words in subsequent steps to enhance the cue words, often referred to as contextual enrichment.
  6. The prompt word containing the user question is passed to LLM along with enhanced contextual knowledge
  7. LLM answers based on this context.

Hands-on development

(1) Set up a Python virtual environment Set up a virtual environment to sandbox our Python to avoid any version or dependency conflicts. Execute the following command to create a new Python virtual environment.

需要重写的内容是:pip安装virtualenv,python3 -m venv ./venv,source venv/bin/activate

The content that needs to be rewritten is: (2) Generate OpenAI key

Using GPT requires an OpenAI key for access

Improve engineering efficiency - enhanced search generation (RAG)

The content that needs to be rewritten is: (3) Installation of dependent libraries

Various dependencies required by the installation program. Includes the following libraries:

  • lanchain: A framework for developing LLM applications.
  • chromaDB: This is VectorDB for persistent vector embeddings.
  • unstructured: used to preprocess Word/PDF documents.
  • tiktoken: Tokenizer framework
  • pypdf: A framework for reading and processing PDF documents.
  • openai: Access the OpenAI framework.
pip install langchainpip install unstructuredpip install pypdfpip install tiktokenpip install chromadbpip install openai

Create an environment variable to store the OpenAI key.

export OPENAI_API_KEY=<openai-key></openai-key>

(4) Convert the user manual PDF file into a vector and store it in ChromaDB

Import all the dependent libraries and functions that need to be used

import osimport openaiimport tiktokenimport chromadbfrom langchain.document_loaders import OnlinePDFLoader, UnstructuredPDFLoader, PyPDFLoaderfrom langchain.text_splitter import TokenTextSplitterfrom langchain.memory import ConversationBufferMemoryfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import Chromafrom langchain.llms import OpenAIfrom langchain.chains import ConversationalRetrievalChain

Read PDF, tokenize document and split document.

loader = PyPDFLoader("Clarett.pdf")pdfData = loader.load()text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)splitData = text_splitter.split_documents(pdfData)

Create a chroma collection and a local directory to store chroma data. Then, create a vector (embeddings) and store it in ChromaDB.

collection_name = "clarett_collection"local_directory = "clarett_vect_embedding"persist_directory = os.path.join(os.getcwd(), local_directory)openai_key=os.environ.get('OPENAI_API_KEY')embeddings = OpenAIEmbeddings(openai_api_key=openai_key)vectDB = Chroma.from_documents(splitData,embeddings,collection_name=collection_name,persist_directory=persist_directory)vectDB.persist()

After executing this code, you should see a folder that has been created to store the vectors.

Improve engineering efficiency - enhanced search generation (RAG)

After storing the vector embedding in ChromaDB, you can use the ConversationalRetrievalChain API in LangChain to start a chat history component

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)chatQA = ConversationalRetrievalChain.from_llm(OpenAI(openai_api_key=openai_key, temperature=0, model_name="gpt-3.5-turbo"), vectDB.as_retriever(), memory=memory)

After initializing langchan, we You can use it to chat/Q A. In the code below, a question entered by the user is accepted, and after the user enters 'done', the question is passed to LLM to get the reply and print it out.

chat_history = []qry = ""while qry != 'done':qry = input('Question: ')if qry != exit:response = chatQA({"question": qry, "chat_history": chat_history})print(response["answer"])

Improve engineering efficiency - enhanced search generation (RAG)

Improve engineering efficiency - enhanced search generation (RAG)

In short

RAG combines the advantages of language models such as GPT with the advantages of information retrieval. By utilizing specific knowledge context information to enhance the richness of prompt words, the language model is able to generate more accurate answers relevant to the knowledge context. RAG provides a more efficient and cost-effective solution than "fine-tuning", providing customizable interactive solutions for industry applications or enterprise applications

The above is the detailed content of Improve engineering efficiency - enhanced search generation (RAG). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete