ホームページ >バックエンド開発 >Python チュートリアル >カーソルのページネーションの例
こんにちは、カーソル ページネーション パターン (またはカーソル ページング パターン) の例を共有したいと思いました。検索したときに前に進む例しか見つからなかったためです。逆方向ではなく、開始時と終了時のデータの処理方法でもありません。
このリポジトリはここで見ることができますが、ここですべてを説明してみます。
私はパッケージ管理ツールとして Python Poetry を使用しているため、この例ではすでにそれを持っているものと仮定します。最初に行うことは、poetry install を使用して依存関係をインストールすることです。 pip を使用して次のようにインストールすることもできます: pip install pymongo loguru.
次に、Mongo データベースも必要です。MongoDB Community Edition をここからダウンロードでき、このガイドを使用して構成できます。
依存関係がインストールされ、データベースが完成したので、それにデータを追加できます。そのために、これを使用できます:
from pymongo import MongoClient # Data to add sample_posts = [ {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)}, {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)}, {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)}, {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)}, {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)}, {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)}, {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)}, {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)}, {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)}, {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)}, {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)}, ] # Creating connection token = "mongodb://localhost:27017" client = MongoClient(token) cursor_db = client.cursor_db.content cursor_db.insert_many(sample_posts)
これにより、コレクション コンテンツへのローカル データベースへの接続が作成されます。次に、sample_posts の値をそれに追加します。検索するデータができたので、クエリを開始できます。検索を開始して、最後までデータを読み込んでみましょう。
# Import libraries from bson.objectid import ObjectId from datetime import datetime from loguru import logger from pymongo import MongoClient # Use token to connect to local database token = "mongodb://localhost:27017" client = MongoClient(token) # Access cursor_db collection (it will be created if it does not exist) cursor_db = client.cursor_db.content default_page_size = 5 def fetch_next_page(cursor, page_size = None): # Use the provided page_size or use a default value page_size = page_size or default_page_size # Check if there is a cursor if cursor: # Get documents with `_id` greater than the cursor query = {"_id": {'$gt': cursor}} else: # Get everything query = {} # Sort in ascending order by `_id` sort_order = 1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(cursor_db.aggregate(pipeline)) # logger.debug(results) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Set the cursor for the next page next_cursor = results[-1]['_id'] # Set the previous cursor if cursor: # in case the cursor have data prev_cursor = results[0]['_id'] else: # In case the cursor don't have data (first page) prev_cursor = None # Indicate you haven't reached the end of the data at_end = False else: # Indicate that there are not more pages available (last page reached) next_cursor = None # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Indicate you have reached the end of the data at_end = True return results, next_cursor, prev_cursor, at_end @logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") if __name__: main() logger.info("--- Execution end ---")
そのコードは次を返します:
2024-09-02 08:55:24.388 | INFO | __main__:main:73 - results = [{'_id': ObjectId('66bdfdcf7a0667fd1888c20c'), 'title': 'Post 1', 'content': 'Content 1', 'date': datetime.datetime(2023, 8, 1, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20d'), 'title': 'Post 2', 'content': 'Content 2', 'date': datetime.datetime(2023, 8, 2, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20e'), 'title': 'Post 3', 'content': 'Content 3', 'date': datetime.datetime(2023, 8, 3, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20f'), 'title': 'Post 4', 'content': 'Content 4', 'date': datetime.datetime(2023, 8, 4, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c210'), 'title': 'Post 5', 'content': 'Content 5', 'date': datetime.datetime(2023, 8, 5, 0, 0)}] 2024-09-02 08:55:24.388 | INFO | __main__:main:74 - next_cursor = ObjectId('66bdfdcf7a0667fd1888c210') 2024-09-02 08:55:24.388 | INFO | __main__:main:75 - prev_cursor = None 2024-09-02 08:55:24.388 | INFO | __main__:main:76 - at_end = False 2024-09-02 08:55:24.388 | INFO | __main__:<module>:79 - --- Execution end ---
カーソルが次のページを指しており、前のページは None であることがわかります。また、データの終わりではないこともわかります。この値を取得するには、関数 fetch_next_page を詳しく調べる必要があります。ここでは、page_size、クエリ、sort_order を定義し、集計操作へのパイプラインを作成していることがわかります。別の情報ページがあるかどうかを識別するには、$limit 演算子を使用します。page_size + 1 の値を指定して、その + 1 を持つ別のページが実際に存在するかどうかを確認します。実際にそれを確認するには、式 len( を使用します。結果)> page_size。返されるデータの数が page_size より大きい場合は、別のページが存在します。逆に、これが最後のページです。
次のページがある場合、クエリした情報のリストから最後の要素を削除する必要があります。これはパイプラインの + 1 であったため、現在の最後の値の _id を使用して next_cursor を設定する必要があります。リストを参照し、場合に応じて prev_cursor (前のカーソル) を設定します。カーソルがあった場合は、このカーソルの前にデータがあることを意味し、そうでない場合は、これがデータの最初のグループであることを意味します。事前情報がないため、カーソルは見つかったデータの最初の _id または None である必要があります。
データを検索し、重要な検証を追加する方法がわかったので、データを前方にトラバースする方法を有効にする必要があります。そのために、入力コマンドを使用して、スクリプトを実行しているユーザーに移動方向の書き込みを要求します。ただし、現時点では前方 (f) のみになります。次のようにメイン関数を更新できます:
@logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") # Checking if there is more data to show if next_cursor: # Enter a cycle to traverse the data while(True): print(125 * "*") # Ask for the user to move forward or cancel the execution inn = input("Can only move Forward (f) or Cancel (c): ") # Execute action acording to the input if inn == "f": results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, default_page_size) elif inn == "c": logger.warning("------- Canceling execution -------") break else: # In case the user sends something that is not a valid option print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show")
これにより、データを最後まで走査することができますが、最後に到達すると最初に戻り、サイクルが再び開始されるため、これを回避し、逆方向に進むためにいくつかの検証を追加する必要があります。そのために、関数 fetch_previous_page を作成し、main 関数にいくつかの変更を追加します。
def fetch_previous_page(cursor, page_size = None): # Use the provided page_size or fallback to the class attribute page_size = page_size or default_page_size # Check if there is a cursor if cursor: # Get documents with `_id` less than the cursor query = {'_id': {'$lt': cursor}} else: # Get everything query = {} # Sort in descending order by `_id` sort_order = -1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(cursor_db.aggregate(pipeline)) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Reverse the results to maintain the correct order results.reverse() # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Set the cursor for the next page next_cursor = results[-1]['_id'] # Indicate you are not at the start of the data at_start = False else: # Reverse the results to maintain the correct order results.reverse() # Indicate that there are not more previous pages available (initial page reached) prev_cursor = None # !!!! next_cursor = results[-1]['_id'] # Indicate you have reached the start of the data at_start = True return results, next_cursor, prev_cursor, at_start
fetch_next_page と非常に似ていますが、クエリ (条件が満たされた場合) は演算子 $lt を使用し、必要な順序でデータを取得するには sort_order を -1 にする必要があります。ここで、len(results) > かどうかを検証するときに、 page_size 、条件が true の場合、余分な要素を削除し、データが正しく表示されるようにデータの順序を逆にして、前のカーソルをデータの最初の要素に設定し、次のカーソルをデータの最初の要素に設定します。最後まで。逆に、データは逆で、前のカーソルは None に設定され (前のデータがないため)、次のカーソルはリストの最後の値に設定されます。どちらの場合も、この状況を識別するために at_start と呼ばれるブール変数が定義されます。ここで、メイン関数に後戻りするためのユーザーとのインタラクションを追加する必要があります。そのため、データの先頭、末尾、または途中にいる場合に対処する必要がある 3 つの状況があります。前進のみ、後進のみです。 、そして前進または後退:
@logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") # Checking if there is more data to show if not(at_start and at_end): # Enter a cycle to traverse the data while(True): print(125 * "*") # Ask for the user to move forward or cancel the execution if at_end: inn = input("Can only move Backward (b) or Cancel (c): ") stage = 0 elif at_start: inn = input("Can only move Forward (f) or Cancel (c): ") stage = 1 else: inn = input("Can move Forward (f), Backward (b), or Cancel (c): ") stage = 2 # Execute action acording to the input if inn == "f" and stage in [1, 2]: results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_start = False elif inn == "b" and stage in [0, 2]: results, next_cursor, prev_cursor, at_start = fetch_previous_page(prev_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_end = False elif inn == "c": logger.warning("------- Canceling execution -------") break else: print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show")
データのトラバース中に現在のステージを識別するためのユーザー入力に検証を追加しました。また、fetch_next_page と fetch_previous_page の実行後の at_start と at_end にはそれぞれ到達後にリセットする必要があることにも注意してください。限界。これで、データの末尾に到達し、先頭まで遡ることができます。データの最初のページを取得した後の検証は、フラグ at_start と at_end が True かどうかをチェックするように更新されました。これは、表示するデータがこれ以上ないことを示します。
Note: I was facing a bug at this point which I cannot reproduce right now, but it was causing problems when going backward and reaching the start, the cursor was pointing to the wrong place and when you wanted to go forward it skip 1 element. To solve it I added a validation in fetch_previous_page if a parameter called prev_at_start (which is the previous value of at_start) to assing next_cursor the value results[0]['_id'] or, results[-1]['_id'] in case the previous stage was not at the beginning of the data. This will be ommited from now on, but I think is worth the mention.
Now that we can traverse the data from beginning to end and going forward or backward in it, we can create a class that have all this functions and call it to use the example. Also we must add the docstring so everything is documents correctly. The result of that are in this code:
"""Cursor Paging/Pagination Pattern Example.""" from bson.objectid import ObjectId from datetime import datetime from loguru import logger from pymongo import MongoClient class cursorPattern: """ A class to handle cursor-based pagination for MongoDB collections. Attributes: ----------- cursor_db : pymongo.collection.Collection The MongoDB collection used for pagination. page_size : int Size of the pages. """ def __init__(self, page_size: int = 5) -> None: """Initializes the class. Sets up a connection to MongoDB and specifying the collection to work with. """ token = "mongodb://localhost:27017" client = MongoClient(token) self.cursor_db = client.cursor_db.content self.page_size = page_size def add_data(self,) -> None: """Inserts sample data into the MongoDB collection for demonstration purposes. Note: ----- It should only use once, otherwise you will have repeated data. """ sample_posts = [ {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)}, {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)}, {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)}, {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)}, {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)}, {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)}, {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)}, {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)}, {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)}, {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)}, {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)}, ] self.cursor_db.insert_many(sample_posts) def _fetch_next_page( self, cursor: ObjectId | None, page_size: int | None = None ) -> tuple[list, ObjectId | None, ObjectId | None, bool]: """Retrieves the next page of data based on the provided cursor. Parameters: ----------- cursor : ObjectId | None The current cursor indicating the last document of the previous page. page_size : int | None The number of documents to retrieve per page (default is the class's page_size). Returns: -------- tuple: - results (list): The list of documents retrieved. - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page. - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page. - at_end (bool): Whether this is the last page of results. """ # Use the provided page_size or fallback to the class attribute page_size = page_size or self.page_size # Check if there is a cursor if cursor: # Get documents with `_id` greater than the cursor query = {"_id": {'$gt': cursor}} else: # Get everything query = {} # Sort in ascending order by `_id` sort_order = 1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(self.cursor_db.aggregate(pipeline)) # logger.debug(results) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Set the cursor for the next page next_cursor = results[-1]['_id'] # Set the previous cursor if cursor: # in case the cursor have data prev_cursor = results[0]['_id'] else: # In case the cursor don't have data (first time) prev_cursor = None # Indicate you haven't reached the end of the data at_end = False else: # Indicate that there are not more pages available (last page reached) next_cursor = None # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Indicate you have reached the end of the data at_end = True return results, next_cursor, prev_cursor, at_end def _fetch_previous_page( self, cursor: ObjectId | None, page_size: int | None = None, ) -> tuple[list, ObjectId | None, ObjectId | None, bool]: """Retrieves the previous page of data based on the provided cursor. Parameters: ----------- cursor : ObjectId | None The current cursor indicating the first document of the current page. page_size : int The number of documents to retrieve per page. prev_at_start : bool Indicates whether the previous page was the first page. Returns: -------- tuple: - results (list): The list of documents retrieved. - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page. - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page. - at_start (bool): Whether this is the first page of results. """ # Use the provided page_size or fallback to the class attribute page_size = page_size or self.page_size # Check if there is a cursor if cursor: # Get documents with `_id` less than the cursor query = {'_id': {'$lt': cursor}} else: # Get everything query = {} # Sort in descending order by `_id` sort_order = -1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(self.cursor_db.aggregate(pipeline)) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Reverse the results to maintain the correct order results.reverse() # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Set the cursor for the next page next_cursor = results[-1]['_id'] # Indicate you are not at the start of the data at_start = False else: # Reverse the results to maintain the correct order results.reverse() # Indicate that there are not more previous pages available (initial page reached) prev_cursor = None # if prev_at_start: # # in case before was at the starting page # logger.warning("Caso 1") # next_cursor = results[0]['_id'] # else: # # in case before was not at the starting page # logger.warning("Caso 2") # next_cursor = results[-1]['_id'] next_cursor = results[-1]['_id'] # Indicate you have reached the start of the data at_start = True return results, next_cursor, prev_cursor, at_start def start_pagination(self): """Inicia la navegacion de datos.""" # Change page size in case you want it, only leave it here for reference page_size = None # Retrieve the first page of results results, next_cursor, prev_cursor, at_end = self._fetch_next_page(None, page_size) at_start = True logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") # if next_cursor: if not(at_start and at_end): while(True): print(125 * "*") if at_end: inn = input("Can only move Backward (b) or Cancel (c): ") stage = 0 # ===================================================== # You could reset at_end here, but in this example that # will fail in case the user sends something different # from Backward (b) or Cancel (c) # ===================================================== # at_end = False elif at_start: inn = input("Can only move Forward (f) or Cancel (c): ") stage = 1 # ===================================================== # You could reset at_end here, but in this example that # will fail in case the user sends something different # from Forward (f) or Cancel (c) # ===================================================== # at_start = False else: inn = input("Can move Forward (f), Backward (b), or Cancel (c): ") stage = 2 # Execute action acording to the input if inn == "f" and stage in [1, 2]: results, next_cursor, prev_cursor, at_end = self._fetch_next_page(next_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_start = False elif inn == "b" and stage in [0, 2]: # results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, at_start, page_size) results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_end = False elif inn == "c": logger.warning("------- Canceling execution -------") break else: print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show") @logger.catch def main(): """Main function.""" my_cursor = cursorPattern(page_size=5) # my_cursor.add_data() my_cursor.start_pagination() if __name__: main() logger.info("--- Execution end ---")
The page_size was added as an attribute to the class cursorPattern for it to be easier to define the size of every page and added docstrings to the class and its methods.
Hope this will help/guide someone that needs to implement Cursor Pagination.
以上がカーソルのページネーションの例の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。