ホームページ >バックエンド開発 >Python チュートリアル >カーソルのページネーションの例

カーソルのページネーションの例

WBOY
WBOYオリジナル
2024-09-03 11:09:50650ブラウズ

Cursor Pagination Example

こんにちは、カーソル ページネーション パターン (またはカーソル ページング パターン) の例を共有したいと思いました。検索したときに前に進む例しか見つからなかったためです。逆方向ではなく、開始時と終了時のデータの処理方法でもありません。

このリポジトリはここで見ることができますが、ここですべてを説明してみます。

私はパッケージ管理ツールとして Python Poetry を使用しているため、この例ではすでにそれを持っているものと仮定します。最初に行うことは、poetry install を使用して依存関係をインストールすることです。 pip を使用して次のようにインストールすることもできます: pip install pymongo loguru.

次に、Mongo データベースも必要です。MongoDB Community Edition をここからダウンロードでき、このガイドを使用して構成できます。

依存関係がインストールされ、データベースが完成したので、それにデータを追加できます。そのために、これを使用できます:

from pymongo import MongoClient

# Data to add
sample_posts = [
    {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)},
    {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)},
    {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)},
    {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)},
    {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)},
    {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)},
    {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)},
    {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)},
    {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)},
    {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)},
    {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)},
]
# Creating connection
token = "mongodb://localhost:27017"
client = MongoClient(token)
cursor_db = client.cursor_db.content
cursor_db.insert_many(sample_posts)

これにより、コレクション コンテンツへのローカル データベースへの接続が作成されます。次に、sample_posts の値をそれに追加します。検索するデータができたので、クエリを開始できます。検索を開始して、最後までデータを読み込んでみましょう。

# Import libraries
from bson.objectid import ObjectId
from datetime import datetime

from loguru import logger
from pymongo import MongoClient

# Use token to connect to local database
token = "mongodb://localhost:27017"
client = MongoClient(token)
# Access cursor_db collection (it will be created if it does not exist)
cursor_db = client.cursor_db.content
default_page_size = 5

def fetch_next_page(cursor, page_size = None):
    # Use the provided page_size or use a default value
    page_size = page_size or default_page_size  

    # Check if there is a cursor
    if cursor:
        # Get documents with `_id` greater than the cursor
        query = {"_id": {'$gt': cursor}}
    else:
        # Get everything
        query = {}
    # Sort in ascending order by `_id`
    sort_order = 1 

    # Define the aggregation pipeline
    pipeline = [
        {"$match": query},  # Filter based on the cursor
        {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
        {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
        # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
    ]

    # Execute the aggregation pipeline
    results = list(cursor_db.aggregate(pipeline))
    # logger.debug(results)

    # Validate if some data was found
    if not results: raise ValueError("No data found")

    # Check if there are more documents than the page size
    if len(results) > page_size:
        # Deleting extra document
        results.pop(-1)
        # Set the cursor for the next page
        next_cursor = results[-1]['_id']
        # Set the previous cursor
        if cursor:
            # in case the cursor have data
            prev_cursor = results[0]['_id']
        else:
            # In case the cursor don't have data (first page)
            prev_cursor = None
        # Indicate you haven't reached the end of the data
        at_end = False
    else:
        # Indicate that there are not more pages available (last page reached)
        next_cursor = None
        # Set the cursor for the previous page
        prev_cursor = results[0]['_id']
        # Indicate you have reached the end of the data
        at_end = True
    return results, next_cursor, prev_cursor, at_end


@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")

if __name__:
    main()
    logger.info("--- Execution end ---")

そのコードは次を返します:

2024-09-02 08:55:24.388 | INFO     | __main__:main:73 - results = [{'_id': ObjectId('66bdfdcf7a0667fd1888c20c'), 'title': 'Post 1', 'content': 'Content 1', 'date': datetime.datetime(2023, 8, 1, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20d'), 'title': 'Post 2', 'content': 'Content 2', 'date': datetime.datetime(2023, 8, 2, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20e'), 'title': 'Post 3', 'content': 'Content 3', 'date': datetime.datetime(2023, 8, 3, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20f'), 'title': 'Post 4', 'content': 'Content 4', 'date': datetime.datetime(2023, 8, 4, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c210'), 'title': 'Post 5', 'content': 'Content 5', 'date': datetime.datetime(2023, 8, 5, 0, 0)}]
2024-09-02 08:55:24.388 | INFO     | __main__:main:74 - next_cursor = ObjectId('66bdfdcf7a0667fd1888c210')
2024-09-02 08:55:24.388 | INFO     | __main__:main:75 - prev_cursor = None
2024-09-02 08:55:24.388 | INFO     | __main__:main:76 - at_end = False
2024-09-02 08:55:24.388 | INFO     | __main__:<module>:79 - --- Execution end ---

カーソルが次のページを指しており、前のページは None であることがわかります。また、データの終わりではないこともわかります。この値を取得するには、関数 fetch_next_page を詳しく調べる必要があります。ここでは、page_size、クエリ、sort_order を定義し、集計操作へのパイプラインを作成していることがわかります。別の情報ページがあるかどうかを識別するには、$limit 演算子を使用します。page_size + 1 の値を指定して、その + 1 を持つ別のページが実際に存在するかどうかを確認します。実際にそれを確認するには、式 len( を使用します。結果)> page_size。返されるデータの数が page_size より大きい場合は、別のページが存在します。逆に、これが最後のページです。

次のページがある場合、クエリした情報のリストから最後の要素を削除する必要があります。これはパイプラインの + 1 であったため、現在の最後の値の _id を使用して next_cursor を設定する必要があります。リストを参照し、場合に応じて prev_cursor (前のカーソル) を設定します。カーソルがあった場合は、このカーソルの前にデータがあることを意味し、そうでない場合は、これがデータの最初のグループであることを意味します。事前情報がないため、カーソルは見つかったデータの最初の _id または None である必要があります。

データを検索し、重要な検証を追加する方法がわかったので、データを前方にトラバースする方法を有効にする必要があります。そのために、入力コマンドを使用して、スクリプトを実行しているユーザーに移動方向の書き込みを要求します。ただし、現時点では前方 (f) のみになります。次のようにメイン関数を更新できます:

@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")
    # Checking if there is more data to show
    if next_cursor:
        # Enter a cycle to traverse the data
        while(True):
            print(125 * "*")
            # Ask for the user to move forward or cancel the execution
            inn = input("Can only move Forward (f) or Cancel (c): ")

            # Execute action acording to the input
            if inn == "f":
                results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, default_page_size)
            elif inn == "c":
                logger.warning("------- Canceling execution -------")
                break
            else:
                # In case the user sends something that is not a valid option
                print("Not valid action, it can only move in the opposite direction.")
                continue
            logger.info(f"{results = }")
            logger.info(f"{next_cursor = }")
            logger.info(f"{prev_cursor = }")
            logger.info(f"{at_end = }")
    else:
        logger.warning("There is not more data to show")

これにより、データを最後まで走査することができますが、最後に到達すると最初に戻り、サイクルが再び開始されるため、これを回避し、逆方向に進むためにいくつかの検証を追加する必要があります。そのために、関数 fetch_previous_page を作成し、main 関数にいくつかの変更を追加します。

def fetch_previous_page(cursor, page_size = None):
    # Use the provided page_size or fallback to the class attribute
    page_size = page_size or default_page_size  

    # Check if there is a cursor
    if cursor:
        # Get documents with `_id` less than the cursor
        query = {'_id': {'$lt': cursor}}
    else:
        # Get everything
        query = {}
    # Sort in descending order by `_id`
    sort_order = -1  

    # Define the aggregation pipeline
    pipeline = [
        {"$match": query},  # Filter based on the cursor
        {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
        {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
        # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
    ]

    # Execute the aggregation pipeline
    results = list(cursor_db.aggregate(pipeline))

    # Validate if some data was found
    if not results: raise ValueError("No data found")

    # Check if there are more documents than the page size
    if len(results) > page_size:
        # Deleting extra document
        results.pop(-1)
        # Reverse the results to maintain the correct order
        results.reverse()
        # Set the cursor for the previous page
        prev_cursor = results[0]['_id']
        # Set the cursor for the next page
        next_cursor = results[-1]['_id']
        # Indicate you are not at the start of the data
        at_start = False
    else:
        # Reverse the results to maintain the correct order
        results.reverse()
        # Indicate that there are not more previous pages available (initial page reached)
        prev_cursor = None
        # !!!!
        next_cursor = results[-1]['_id']
        # Indicate you have reached the start of the data
        at_start = True
    return results, next_cursor, prev_cursor, at_start

fetch_next_page と非常に似ていますが、クエリ (条件が満たされた場合) は演算子 $lt を使用し、必要な順序でデータを取得するには sort_order を -1 にする必要があります。ここで、len(results) > かどうかを検証するときに、 page_size 、条件が true の場合、余分な要素を削除し、データが正しく表示されるようにデータの順序を逆にして、前のカーソルをデータの最初の要素に設定し、次のカーソルをデータの最初の要素に設定します。最後まで。逆に、データは逆で、前のカーソルは None に設定され (前のデータがないため)、次のカーソルはリストの最後の値に設定されます。どちらの場合も、この状況を識別するために at_start と呼ばれるブール変数が定義されます。ここで、メイン関数に後戻りするためのユーザーとのインタラクションを追加する必要があります。そのため、データの先頭、末尾、または途中にいる場合に対処する必要がある 3 つの状況があります。前進のみ、後進のみです。 、そして前進または後退:

@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")
    # Checking if there is more data to show
    if not(at_start and at_end):
        # Enter a cycle to traverse the data
        while(True):
            print(125 * "*")
            # Ask for the user to move forward or cancel the execution
            if at_end:
                inn = input("Can only move Backward (b) or Cancel (c): ")
                stage = 0
            elif at_start:
                inn = input("Can only move Forward (f) or Cancel (c): ")
                stage = 1
            else:
                inn = input("Can move Forward (f), Backward (b), or Cancel (c): ")
                stage = 2

            # Execute action acording to the input
            if inn == "f" and stage in [1, 2]:
                results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, page_size)
                # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                at_start = False
            elif inn == "b" and stage in [0, 2]:
                results, next_cursor, prev_cursor, at_start = fetch_previous_page(prev_cursor, page_size)
                # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                at_end = False
            elif inn == "c":
                logger.warning("------- Canceling execution -------")
                break
            else:
                print("Not valid action, it can only move in the opposite direction.")
                continue
            logger.info(f"{results = }")
            logger.info(f"{next_cursor = }")
            logger.info(f"{prev_cursor = }")
            logger.info(f"{at_start = }")
            logger.info(f"{at_end = }")
    else:
        logger.warning("There is not more data to show")

データのトラバース中に現在のステージを識別するためのユーザー入力に検証を追加しました。また、fetch_next_page と fetch_previous_page の実​​行後の at_start と at_end にはそれぞれ到達後にリセットする必要があることにも注意してください。限界。これで、データの末尾に到達し、先頭まで遡ることができます。データの最初のページを取得した後の検証は、フラグ at_start と at_end が True かどうかをチェックするように更新されました。これは、表示するデータがこれ以上ないことを示します。

Note: I was facing a bug at this point which I cannot reproduce right now, but it was causing problems when going backward and reaching the start, the cursor was pointing to the wrong place and when you wanted to go forward it skip 1 element. To solve it I added a validation in fetch_previous_page if a parameter called prev_at_start (which is the previous value of at_start) to assing next_cursor the value results[0]['_id'] or, results[-1]['_id'] in case the previous stage was not at the beginning of the data. This will be ommited from now on, but I think is worth the mention.

Now that we can traverse the data from beginning to end and going forward or backward in it, we can create a class that have all this functions and call it to use the example. Also we must add the docstring so everything is documents correctly. The result of that are in this code:

"""Cursor Paging/Pagination Pattern Example."""
from bson.objectid import ObjectId
from datetime import datetime

from loguru import logger
from pymongo import MongoClient

class cursorPattern:
    """
    A class to handle cursor-based pagination for MongoDB collections.

    Attributes:
    -----------
    cursor_db : pymongo.collection.Collection
        The MongoDB collection used for pagination.
    page_size : int
        Size of the pages.

    """

    def __init__(self, page_size: int = 5) -> None:
        """Initializes the class.

        Sets up a connection to MongoDB and specifying 
        the collection to work with.

        """
        token = "mongodb://localhost:27017"
        client = MongoClient(token)
        self.cursor_db = client.cursor_db.content
        self.page_size = page_size

    def add_data(self,) -> None:
        """Inserts sample data into the MongoDB collection for demonstration purposes.

        Note:
        -----
        It should only use once, otherwise you will have repeated data.

        """
        sample_posts = [
            {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)},
            {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)},
            {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)},
            {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)},
            {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)},
            {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)},
            {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)},
            {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)},
            {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)},
            {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)},
            {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)},
        ]
        self.cursor_db.insert_many(sample_posts)

    def _fetch_next_page(
        self, cursor: ObjectId | None, page_size: int | None = None
    ) -> tuple[list, ObjectId | None, ObjectId | None, bool]:
        """Retrieves the next page of data based on the provided cursor.

        Parameters:
        -----------
        cursor : ObjectId | None
            The current cursor indicating the last document of the previous page.
        page_size : int | None
            The number of documents to retrieve per page (default is the class's page_size).

        Returns:
        --------
        tuple:
            - results (list): The list of documents retrieved.
            - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page.
            - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page.
            - at_end (bool): Whether this is the last page of results.
        """
        # Use the provided page_size or fallback to the class attribute
        page_size = page_size or self.page_size  

        # Check if there is a cursor
        if cursor:
            # Get documents with `_id` greater than the cursor
            query = {"_id": {'$gt': cursor}}
        else:
            # Get everything
            query = {}
        # Sort in ascending order by `_id`
        sort_order = 1 

        # Define the aggregation pipeline
        pipeline = [
            {"$match": query},  # Filter based on the cursor
            {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
            {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
            # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
        ]

        # Execute the aggregation pipeline
        results = list(self.cursor_db.aggregate(pipeline))
        # logger.debug(results)

        # Validate if some data was found
        if not results: raise ValueError("No data found")

        # Check if there are more documents than the page size
        if len(results) > page_size:
            # Deleting extra document
            results.pop(-1)
            # Set the cursor for the next page
            next_cursor = results[-1]['_id']
            # Set the previous cursor
            if cursor:
                # in case the cursor have data
                prev_cursor = results[0]['_id']
            else:
                # In case the cursor don't have data (first time)
                prev_cursor = None
            # Indicate you haven't reached the end of the data
            at_end = False
        else:
            # Indicate that there are not more pages available (last page reached)
            next_cursor = None
            # Set the cursor for the previous page
            prev_cursor = results[0]['_id']
            # Indicate you have reached the end of the data
            at_end = True
        return results, next_cursor, prev_cursor, at_end

    def _fetch_previous_page(
        self, cursor: ObjectId | None, page_size: int | None = None, 
    ) -> tuple[list, ObjectId | None, ObjectId | None, bool]:
        """Retrieves the previous page of data based on the provided cursor.

        Parameters:
        -----------
        cursor : ObjectId | None
            The current cursor indicating the first document of the current page.
        page_size : int
            The number of documents to retrieve per page.
        prev_at_start : bool
            Indicates whether the previous page was the first page.

        Returns:
        --------
        tuple:
            - results (list): The list of documents retrieved.
            - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page.
            - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page.
            - at_start (bool): Whether this is the first page of results.
        """
        # Use the provided page_size or fallback to the class attribute
        page_size = page_size or self.page_size  

        # Check if there is a cursor
        if cursor:
            # Get documents with `_id` less than the cursor
            query = {'_id': {'$lt': cursor}}
        else:
            # Get everything
            query = {}
        # Sort in descending order by `_id`
        sort_order = -1  

        # Define the aggregation pipeline
        pipeline = [
            {"$match": query},  # Filter based on the cursor
            {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
            {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
            # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
        ]

        # Execute the aggregation pipeline
        results = list(self.cursor_db.aggregate(pipeline))

        # Validate if some data was found
        if not results: raise ValueError("No data found")

        # Check if there are more documents than the page size
        if len(results) > page_size:
            # Deleting extra document
            results.pop(-1)
            # Reverse the results to maintain the correct order
            results.reverse()
            # Set the cursor for the previous page
            prev_cursor = results[0]['_id']
            # Set the cursor for the next page
            next_cursor = results[-1]['_id']
            # Indicate you are not at the start of the data
            at_start = False
        else:
            # Reverse the results to maintain the correct order
            results.reverse()
            # Indicate that there are not more previous pages available (initial page reached)
            prev_cursor = None
            # if prev_at_start:
            #     # in case before was at the starting page
            #     logger.warning("Caso 1")
            #     next_cursor = results[0]['_id']
            # else:
            #     # in case before was not at the starting page
            #     logger.warning("Caso 2")
            #     next_cursor = results[-1]['_id']
            next_cursor = results[-1]['_id']
            # Indicate you have reached the start of the data
            at_start = True
        return results, next_cursor, prev_cursor, at_start

    def start_pagination(self):
        """Inicia la navegacion de datos."""
        # Change page size in case you want it, only leave it here for reference
        page_size = None
        # Retrieve the first page of results
        results, next_cursor, prev_cursor, at_end = self._fetch_next_page(None, page_size)
        at_start = True
        logger.info(f"{results = }")
        logger.info(f"{next_cursor = }")
        logger.info(f"{prev_cursor = }")
        logger.info(f"{at_start = }")
        logger.info(f"{at_end = }")
        # if next_cursor:
        if not(at_start and at_end):
            while(True):
                print(125 * "*")
                if at_end:
                    inn = input("Can only move Backward (b) or Cancel (c): ")
                    stage = 0
                    # =====================================================
                    # You could reset at_end here, but in this example that
                    # will fail in case the user sends something different
                    # from Backward (b) or Cancel (c)
                    # =====================================================
                    # at_end = False
                elif at_start:
                    inn = input("Can only move Forward (f) or Cancel (c): ")
                    stage = 1
                    # =====================================================
                    # You could reset at_end here, but in this example that
                    # will fail in case the user sends something different
                    # from Forward (f) or Cancel (c)
                    # =====================================================
                    # at_start = False
                else:
                    inn = input("Can move Forward (f), Backward (b), or Cancel (c): ")
                    stage = 2

                # Execute action acording to the input
                if inn == "f" and stage in [1, 2]:
                    results, next_cursor, prev_cursor, at_end = self._fetch_next_page(next_cursor, page_size)
                    # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                    at_start = False
                elif inn == "b" and stage in [0, 2]:
                    # results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, at_start, page_size)
                    results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, page_size)
                    # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                    at_end = False
                elif inn == "c":
                    logger.warning("------- Canceling execution -------")
                    break
                else:
                    print("Not valid action, it can only move in the opposite direction.")
                    continue
                logger.info(f"{results = }")
                logger.info(f"{next_cursor = }")
                logger.info(f"{prev_cursor = }")
                logger.info(f"{at_start = }")
                logger.info(f"{at_end = }")
        else:
            logger.warning("There is not more data to show")

@logger.catch
def main():
    """Main function."""
    my_cursor = cursorPattern(page_size=5)
    # my_cursor.add_data()
    my_cursor.start_pagination()

if __name__:
    main()
    logger.info("--- Execution end ---")

The page_size was added as an attribute to the class cursorPattern for it to be easier to define the size of every page and added docstrings to the class and its methods.

Hope this will help/guide someone that needs to implement Cursor Pagination.

以上がカーソルのページネーションの例の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明:
この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。