>檢索效果生成(RAG)可以顯著增強語言模型。 標準抹佈在提高響應相關性的同時,通常會在復雜的檢索情況下步履蹣跚。本文研究了基本抹布的缺點,並提出了提高準確性和效率的高級方法。
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
>
我如何提高自己的健康和生產力?>
這是助手函數:評估基本抹布
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>
<code>def get_embedding(text, model="text-embedding-3-large"): #"text-embedding-ada-002" """Fetches the embedding for a given text using OpenAI's API.""" response = client.embeddings.create( input=[text], model=model ) return response.data[0].embedding</code>我們使用預定義的查詢測試基本抹布,以評估其基於語義相似性檢索最相關文檔的能力。 這突出了它的局限性。
<code>def compute_similarity_metrics(embed1, embed2): """Computes different similarity/distance metrics between two embeddings.""" cosine_sim = 1- cosine(embed1, embed2) # Cosine similarity return cosine_sim</code>
<code>def fetch_similar_docs(query, docs, threshold = .55, top=1): query_em = get_embedding(query) data = [] for d in docs: # Compute and print similarity metrics similarity_results = compute_similarity_metrics(d["embedding"], query_em) if(similarity_results >= threshold): data.append({"id":d["id"], "ref_doc":d.get("ref_doc", ""), "score":similarity_results}) # Sorting by value (second element in each tuple) sorted_data = sorted(data, key=lambda x: x["score"], reverse=True) # Ascending order sorted_data = sorted_data[:min(top, len(sorted_data))] return sorted_data</code>用於增強rag的高級技術
實現了三個關鍵增強:
<code>"""# **Testing Vanilla RAG**""" query = "what should I do to stay healthy and productive?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r) query = "what are the best practices to stay healthy and productive ?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r)</code>
>
2。創建概述
<code>def generate_faq(text): prompt = f''' given the following text: """{text}""" Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...] ''' return query_chatgpt(prompt, response_format={ "type": "json_object" })</code>3。查詢分解
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
>示例常見問題解答輸出:
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>成本效益分析
結論
以上是增強抹布:超越香草的方法的詳細內容。更多資訊請關注PHP中文網其他相關文章!