在當今的數位時代,資訊豐富,但找到正確的數據可能是一個挑戰。元搜尋引擎聚合來自多個搜尋引擎的結果,提供更全面的可用資訊視圖。在這篇文章中,我們將逐步介紹用 Python 建立一個簡單的元搜尋引擎的過程,包括錯誤處理、速率限制和隱私功能。
元搜尋引擎不維護自己的索引頁面資料庫。相反,它將用戶查詢發送到多個搜尋引擎,收集結果,並以統一的格式呈現它們。這種方法允許用戶訪問更廣泛的信息,而無需單獨搜尋每個引擎。
要學習本教程,您需要:
首先,確保您安裝了必要的庫。我們將使用 requests 來發出 HTTP 請求,並使用 json 來處理 JSON 資料。
您可以使用 pip 安裝 requests 函式庫:
pip install requests
建立一個名為meta_search_engine.py 的新Python 文件,並先定義要查詢的搜尋引擎。在此範例中,我們將使用 DuckDuckGo 和 Bing。
import requests import json import os import time # Define your search engines SEARCH_ENGINES = { "DuckDuckGo": "https://api.duckduckgo.com/?q={}&format=json", "Bing": "https://api.bing.microsoft.com/v7.0/search?q={}&count=10", } BING_API_KEY = "YOUR_BING_API_KEY" # Replace with your Bing API Key
接下來,建立一個函數來查詢搜尋引擎並檢索結果。我們還將實施錯誤處理以優雅地管理網路問題。
def search(query): results = [] # Query DuckDuckGo ddg_url = SEARCH_ENGINES["DuckDuckGo"].format(query) try: response = requests.get(ddg_url) response.raise_for_status() # Raise an error for bad responses data = response.json() for item in data.get("RelatedTopics", []): if 'Text' in item and 'FirstURL' in item: results.append({ 'title': item['Text'], 'url': item['FirstURL'] }) except requests.exceptions.RequestException as e: print(f"Error querying DuckDuckGo: {e}") # Query Bing bing_url = SEARCH_ENGINES["Bing"].format(query) headers = {"Ocp-Apim-Subscription-Key": BING_API_KEY} try: response = requests.get(bing_url, headers=headers) response.raise_for_status() # Raise an error for bad responses data = response.json() for item in data.get("webPages", {}).get("value", []): results.append({ 'title': item['name'], 'url': item['url'] }) except requests.exceptions.RequestException as e: print(f"Error querying Bing: {e}") return results
為了防止達到 API 速率限制,我們將使用 time.sleep() 實作一個簡單的速率限制器。
# Rate limit settings RATE_LIMIT = 1 # seconds between requests def rate_limited_search(query): time.sleep(RATE_LIMIT) # Wait before making the next request return search(query)
為了增強用戶隱私,我們將避免記錄用戶查詢並實施快取機制來暫時儲存結果。
CACHE_FILE = 'cache.json' def load_cache(): if os.path.exists(CACHE_FILE): with open(CACHE_FILE, 'r') as f: return json.load(f) return {} def save_cache(results): with open(CACHE_FILE, 'w') as f: json.dump(results, f) def search_with_cache(query): cache = load_cache() if query in cache: print("Returning cached results.") return cache[query] results = rate_limited_search(query) save_cache({query: results}) return results
為了確保結果唯一,我們將實作一個根據 URL 刪除重複項的功能。
def remove_duplicates(results): seen = set() unique_results = [] for result in results: if result['url'] not in seen: seen.add(result['url']) unique_results.append(result) return unique_results
建立一個函數,以使用者友善的格式顯示搜尋結果。
def display_results(results): for idx, result in enumerate(results, start=1): print(f"{idx}. {result['title']}\n {result['url']}\n")
最後,將所有內容整合到運行元搜尋引擎的主函數中。
def main(): query = input("Enter your search query: ") results = search_with_cache(query) unique_results = remove_duplicates(results) display_results(unique_results) if __name__ == "__main__": main()
這是元搜尋引擎的完整程式碼:
import requests import json import os import time # Define your search engines SEARCH_ENGINES = { "DuckDuckGo": "https://api.duckduckgo.com/?q={}&format=json", "Bing": "https://api.bing.microsoft.com/v7.0/search?q={}&count=10", } BING_API_KEY = "YOUR_BING_API_KEY" # Replace with your Bing API Key # Rate limit settings RATE_LIMIT = 1 # seconds between requests def search(query): results = [] # Query DuckDuckGo ddg_url = SEARCH_ENGINES["DuckDuckGo"].format(query) try: response = requests.get(ddg_url) response.raise_for_status() data = response.json() for item in data.get("RelatedTopics", []): if 'Text' in item and 'FirstURL' in item: results.append({ 'title': item['Text'], 'url': item['FirstURL'] }) except requests.exceptions.RequestException as e: print(f"Error querying DuckDuckGo: {e}") # Query Bing bing_url = SEARCH_ENGINES["Bing"].format(query) headers = {"Ocp-Apim-Subscription-Key": BING_API_KEY} try: response = requests.get(bing_url, headers=headers) response.raise_for_status() data = response.json() for item in data.get("webPages", {}).get("value", []): results.append({ 'title': item['name'], 'url': item['url'] }) except requests.exceptions.RequestException as e: print(f"Error querying Bing: {e}") return results def rate_limited_search(query): time.sleep(RATE_LIMIT) return search(query) CACHE_FILE = 'cache.json' def load_cache(): if os.path.exists(CACHE_FILE): with open(CACHE_FILE, 'r') as f: return json.load(f) return {} def save_cache(results): with open(CACHE_FILE, 'w') as f: json.dump(results, f) def search_with_cache(query): cache = load_cache() if query in cache: print("Returning cached results.") return cache[query] results = rate_limited_search(query) save_cache({query: results}) return results def remove_duplicates(results): seen = set() unique_results = [] for result in results: if result['url'] not in seen: seen.add(result['url']) unique_results.append(result) return unique_results def display_results(results): for idx, result in enumerate(results, start=1): print(f"{idx}. {result['title']}\n {result['url']}\n") def main(): query = input("Enter your search query: ") results = search_with_cache(query) unique_results = remove_duplicates(results) display_results(unique_results) if __name__ == "__main__": main()
恭喜!您已經用 Python 建立了一個簡單但功能強大的元搜尋引擎。該專案不僅示範如何聚合多個來源的搜尋結果,還強調了錯誤處理、速率限制和使用者隱私的重要性。您可以透過添加更多搜尋引擎、實施 Web 介面,甚至整合機器學習以提高結果排名來進一步增強此引擎。快樂編碼!
以上是使用 Python 建立元搜尋引擎:逐步指南的詳細內容。更多資訊請關注PHP中文網其他相關文章!