首頁 >後端開發 >Python教學 >遊標分頁範例

遊標分頁範例

WBOY
WBOY原創
2024-09-03 11:09:50648瀏覽

Cursor Pagination Example

嗨,我想分享一個遊標分頁模式(或遊標分頁模式)的範例,因為當我搜尋一個時,我只能找到前進的案例範例,但是不向後,也不知道如何處理開始和結束的資料。

您可以在此處查看此內容的儲存庫,但我將嘗試在這裡解釋所有內容。

我使用 Python Poetry 作為套件管理工具,因此對於這個範例,我假設您已經擁有它。首先要做的是使用詩歌安裝來安裝依賴項。您也可以使用 pip 來安裝它們: pip install pymongo loguru。

現在我們還需要一個Mongo資料庫,你可以在這裡下載MongoDB社群版,並且可以按照本指南進行設定。

現在我們已經安裝了依賴項和資料庫,我們可以向其中新增資料。為此,我們可以使用這個:

from pymongo import MongoClient

# Data to add
sample_posts = [
    {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)},
    {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)},
    {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)},
    {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)},
    {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)},
    {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)},
    {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)},
    {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)},
    {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)},
    {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)},
    {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)},
]
# Creating connection
token = "mongodb://localhost:27017"
client = MongoClient(token)
cursor_db = client.cursor_db.content
cursor_db.insert_many(sample_posts)

這樣我們就可以建立到本地資料庫到集合內容的連線。然後我們將 sample_posts 中的值加到其中。現在我們有了要搜尋的數據,我們可以開始查詢它。讓我們開始搜尋並讀取數據,直到結束。

# Import libraries
from bson.objectid import ObjectId
from datetime import datetime

from loguru import logger
from pymongo import MongoClient

# Use token to connect to local database
token = "mongodb://localhost:27017"
client = MongoClient(token)
# Access cursor_db collection (it will be created if it does not exist)
cursor_db = client.cursor_db.content
default_page_size = 5

def fetch_next_page(cursor, page_size = None):
    # Use the provided page_size or use a default value
    page_size = page_size or default_page_size  

    # Check if there is a cursor
    if cursor:
        # Get documents with `_id` greater than the cursor
        query = {"_id": {'$gt': cursor}}
    else:
        # Get everything
        query = {}
    # Sort in ascending order by `_id`
    sort_order = 1 

    # Define the aggregation pipeline
    pipeline = [
        {"$match": query},  # Filter based on the cursor
        {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
        {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
        # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
    ]

    # Execute the aggregation pipeline
    results = list(cursor_db.aggregate(pipeline))
    # logger.debug(results)

    # Validate if some data was found
    if not results: raise ValueError("No data found")

    # Check if there are more documents than the page size
    if len(results) > page_size:
        # Deleting extra document
        results.pop(-1)
        # Set the cursor for the next page
        next_cursor = results[-1]['_id']
        # Set the previous cursor
        if cursor:
            # in case the cursor have data
            prev_cursor = results[0]['_id']
        else:
            # In case the cursor don't have data (first page)
            prev_cursor = None
        # Indicate you haven't reached the end of the data
        at_end = False
    else:
        # Indicate that there are not more pages available (last page reached)
        next_cursor = None
        # Set the cursor for the previous page
        prev_cursor = results[0]['_id']
        # Indicate you have reached the end of the data
        at_end = True
    return results, next_cursor, prev_cursor, at_end


@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")

if __name__:
    main()
    logger.info("--- Execution end ---")

該程式碼回傳:

2024-09-02 08:55:24.388 | INFO     | __main__:main:73 - results = [{'_id': ObjectId('66bdfdcf7a0667fd1888c20c'), 'title': 'Post 1', 'content': 'Content 1', 'date': datetime.datetime(2023, 8, 1, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20d'), 'title': 'Post 2', 'content': 'Content 2', 'date': datetime.datetime(2023, 8, 2, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20e'), 'title': 'Post 3', 'content': 'Content 3', 'date': datetime.datetime(2023, 8, 3, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20f'), 'title': 'Post 4', 'content': 'Content 4', 'date': datetime.datetime(2023, 8, 4, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c210'), 'title': 'Post 5', 'content': 'Content 5', 'date': datetime.datetime(2023, 8, 5, 0, 0)}]
2024-09-02 08:55:24.388 | INFO     | __main__:main:74 - next_cursor = ObjectId('66bdfdcf7a0667fd1888c210')
2024-09-02 08:55:24.388 | INFO     | __main__:main:75 - prev_cursor = None
2024-09-02 08:55:24.388 | INFO     | __main__:main:76 - at_end = False
2024-09-02 08:55:24.388 | INFO     | __main__:<module>:79 - --- Execution end ---

可以看到遊標指向下一頁,而上一頁為None,也說明還沒到資料的結尾。為了得到這個值,我們必須更了解函數 fetch_next_page。在那裡我們可以看到我們定義了 page_size、查詢、sort_order,然後我們建立了聚合操作的管道。為了確定是否存在另一頁信息,我們使用 $limit 運算符,我們給出 page_size + 1 的值來檢查實際上是否存在具有該 + 1 的另一頁。要實際檢查它,我們使用表達式 len(結果)> page_size,如果傳回的資料數量大於page_size則還有一頁;相反,這是最後一頁。

對於有下一頁的情況,我們必須從我們查詢的資訊清單中刪除最後一個元素,因為那是管道中的+1,我們需要使用目前最後一個值中的_id來設定next_cursor清單中,根據情況設定prev_cursor(前一個遊標),如果有遊標則說明在這之前有數據,否則說明這是第一組數據,所以有沒有先前的信息,因此,遊標應該是找到的數據中的第一個_id 或None。

現在我們知道如何搜尋資料並添加一些重要的驗證,我們必須啟用一種向前遍歷資料的方法,為此我們將使用輸入命令請求運行腳本的使用者寫入移動方向,不過,現在它只會向前(f)。我們可以更新我們的 main 函數來做到這一點:

@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")
    # Checking if there is more data to show
    if next_cursor:
        # Enter a cycle to traverse the data
        while(True):
            print(125 * "*")
            # Ask for the user to move forward or cancel the execution
            inn = input("Can only move Forward (f) or Cancel (c): ")

            # Execute action acording to the input
            if inn == "f":
                results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, default_page_size)
            elif inn == "c":
                logger.warning("------- Canceling execution -------")
                break
            else:
                # In case the user sends something that is not a valid option
                print("Not valid action, it can only move in the opposite direction.")
                continue
            logger.info(f"{results = }")
            logger.info(f"{next_cursor = }")
            logger.info(f"{prev_cursor = }")
            logger.info(f"{at_end = }")
    else:
        logger.warning("There is not more data to show")

這樣我們就可以遍歷資料直到結束,但是當到達結束時它會返回到開頭並再次開始循環,因此我們必須添加一些驗證來避免這種情況並向後移動。為此,我們將建立函數 fetch_previous_page 並對 main 函數添加一些變更:

def fetch_previous_page(cursor, page_size = None):
    # Use the provided page_size or fallback to the class attribute
    page_size = page_size or default_page_size  

    # Check if there is a cursor
    if cursor:
        # Get documents with `_id` less than the cursor
        query = {'_id': {'$lt': cursor}}
    else:
        # Get everything
        query = {}
    # Sort in descending order by `_id`
    sort_order = -1  

    # Define the aggregation pipeline
    pipeline = [
        {"$match": query},  # Filter based on the cursor
        {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
        {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
        # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
    ]

    # Execute the aggregation pipeline
    results = list(cursor_db.aggregate(pipeline))

    # Validate if some data was found
    if not results: raise ValueError("No data found")

    # Check if there are more documents than the page size
    if len(results) > page_size:
        # Deleting extra document
        results.pop(-1)
        # Reverse the results to maintain the correct order
        results.reverse()
        # Set the cursor for the previous page
        prev_cursor = results[0]['_id']
        # Set the cursor for the next page
        next_cursor = results[-1]['_id']
        # Indicate you are not at the start of the data
        at_start = False
    else:
        # Reverse the results to maintain the correct order
        results.reverse()
        # Indicate that there are not more previous pages available (initial page reached)
        prev_cursor = None
        # !!!!
        next_cursor = results[-1]['_id']
        # Indicate you have reached the start of the data
        at_start = True
    return results, next_cursor, prev_cursor, at_start

與 fetch_next_page 極為相似,但查詢(如果符合條件)使用運算子 $lt 且 sort_order 必須為 -1 才能以所需順序取得資料。現在,當驗證len(results) > 時page_size,如果條件為true,則刪除多餘的元素並反轉資料的順序以使其正確顯示,然後將前一個遊標設為資料的第一個元素,將下一個遊標設為到最後。反之,資料相反,前一個遊標設定為None(因為沒有先前的資料),並將下一個遊標設定為清單的最後一個值。在這兩種情況下,都會定義一個名為 at_start 的布林變數來識別這種情況。現在我們必須在主函數中添加與用戶後退的交互,因此如果我們位於資料的開頭、結尾或中間,則需要處理 3 種情況:僅前進、僅後退,以及前進或後退:

@logger.catch
def main():
    """Main function."""
    # Get the first page
    results, next_cursor, prev_cursor, at_end = fetch_next_page(None)
    logger.info(f"{results = }")
    logger.info(f"{next_cursor = }")
    logger.info(f"{prev_cursor = }")
    logger.info(f"{at_end = }")
    # Checking if there is more data to show
    if not(at_start and at_end):
        # Enter a cycle to traverse the data
        while(True):
            print(125 * "*")
            # Ask for the user to move forward or cancel the execution
            if at_end:
                inn = input("Can only move Backward (b) or Cancel (c): ")
                stage = 0
            elif at_start:
                inn = input("Can only move Forward (f) or Cancel (c): ")
                stage = 1
            else:
                inn = input("Can move Forward (f), Backward (b), or Cancel (c): ")
                stage = 2

            # Execute action acording to the input
            if inn == "f" and stage in [1, 2]:
                results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, page_size)
                # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                at_start = False
            elif inn == "b" and stage in [0, 2]:
                results, next_cursor, prev_cursor, at_start = fetch_previous_page(prev_cursor, page_size)
                # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                at_end = False
            elif inn == "c":
                logger.warning("------- Canceling execution -------")
                break
            else:
                print("Not valid action, it can only move in the opposite direction.")
                continue
            logger.info(f"{results = }")
            logger.info(f"{next_cursor = }")
            logger.info(f"{prev_cursor = }")
            logger.info(f"{at_start = }")
            logger.info(f"{at_end = }")
    else:
        logger.warning("There is not more data to show")

我們對使用者輸入增加了驗證,以識別我們在遍歷資料時所處的階段,還要注意分別執行fetch_next_page 和fetch_previous_page 後的at_start 和at_end ,在達到這些階段後需要重置限制。現在您可以到達資料的末尾並向後移動直到開始。取得第一頁資料後的驗證已更新,以檢查標誌 at_start 和 at_end 是否為 True,這將表示沒有更多資料可顯示。

Note: I was facing a bug at this point which I cannot reproduce right now, but it was causing problems when going backward and reaching the start, the cursor was pointing to the wrong place and when you wanted to go forward it skip 1 element. To solve it I added a validation in fetch_previous_page if a parameter called prev_at_start (which is the previous value of at_start) to assing next_cursor the value results[0]['_id'] or, results[-1]['_id'] in case the previous stage was not at the beginning of the data. This will be ommited from now on, but I think is worth the mention.

Now that we can traverse the data from beginning to end and going forward or backward in it, we can create a class that have all this functions and call it to use the example. Also we must add the docstring so everything is documents correctly. The result of that are in this code:

"""Cursor Paging/Pagination Pattern Example."""
from bson.objectid import ObjectId
from datetime import datetime

from loguru import logger
from pymongo import MongoClient

class cursorPattern:
    """
    A class to handle cursor-based pagination for MongoDB collections.

    Attributes:
    -----------
    cursor_db : pymongo.collection.Collection
        The MongoDB collection used for pagination.
    page_size : int
        Size of the pages.

    """

    def __init__(self, page_size: int = 5) -> None:
        """Initializes the class.

        Sets up a connection to MongoDB and specifying 
        the collection to work with.

        """
        token = "mongodb://localhost:27017"
        client = MongoClient(token)
        self.cursor_db = client.cursor_db.content
        self.page_size = page_size

    def add_data(self,) -> None:
        """Inserts sample data into the MongoDB collection for demonstration purposes.

        Note:
        -----
        It should only use once, otherwise you will have repeated data.

        """
        sample_posts = [
            {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)},
            {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)},
            {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)},
            {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)},
            {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)},
            {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)},
            {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)},
            {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)},
            {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)},
            {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)},
            {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)},
        ]
        self.cursor_db.insert_many(sample_posts)

    def _fetch_next_page(
        self, cursor: ObjectId | None, page_size: int | None = None
    ) -> tuple[list, ObjectId | None, ObjectId | None, bool]:
        """Retrieves the next page of data based on the provided cursor.

        Parameters:
        -----------
        cursor : ObjectId | None
            The current cursor indicating the last document of the previous page.
        page_size : int | None
            The number of documents to retrieve per page (default is the class's page_size).

        Returns:
        --------
        tuple:
            - results (list): The list of documents retrieved.
            - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page.
            - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page.
            - at_end (bool): Whether this is the last page of results.
        """
        # Use the provided page_size or fallback to the class attribute
        page_size = page_size or self.page_size  

        # Check if there is a cursor
        if cursor:
            # Get documents with `_id` greater than the cursor
            query = {"_id": {'$gt': cursor}}
        else:
            # Get everything
            query = {}
        # Sort in ascending order by `_id`
        sort_order = 1 

        # Define the aggregation pipeline
        pipeline = [
            {"$match": query},  # Filter based on the cursor
            {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
            {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
            # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
        ]

        # Execute the aggregation pipeline
        results = list(self.cursor_db.aggregate(pipeline))
        # logger.debug(results)

        # Validate if some data was found
        if not results: raise ValueError("No data found")

        # Check if there are more documents than the page size
        if len(results) > page_size:
            # Deleting extra document
            results.pop(-1)
            # Set the cursor for the next page
            next_cursor = results[-1]['_id']
            # Set the previous cursor
            if cursor:
                # in case the cursor have data
                prev_cursor = results[0]['_id']
            else:
                # In case the cursor don't have data (first time)
                prev_cursor = None
            # Indicate you haven't reached the end of the data
            at_end = False
        else:
            # Indicate that there are not more pages available (last page reached)
            next_cursor = None
            # Set the cursor for the previous page
            prev_cursor = results[0]['_id']
            # Indicate you have reached the end of the data
            at_end = True
        return results, next_cursor, prev_cursor, at_end

    def _fetch_previous_page(
        self, cursor: ObjectId | None, page_size: int | None = None, 
    ) -> tuple[list, ObjectId | None, ObjectId | None, bool]:
        """Retrieves the previous page of data based on the provided cursor.

        Parameters:
        -----------
        cursor : ObjectId | None
            The current cursor indicating the first document of the current page.
        page_size : int
            The number of documents to retrieve per page.
        prev_at_start : bool
            Indicates whether the previous page was the first page.

        Returns:
        --------
        tuple:
            - results (list): The list of documents retrieved.
            - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page.
            - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page.
            - at_start (bool): Whether this is the first page of results.
        """
        # Use the provided page_size or fallback to the class attribute
        page_size = page_size or self.page_size  

        # Check if there is a cursor
        if cursor:
            # Get documents with `_id` less than the cursor
            query = {'_id': {'$lt': cursor}}
        else:
            # Get everything
            query = {}
        # Sort in descending order by `_id`
        sort_order = -1  

        # Define the aggregation pipeline
        pipeline = [
            {"$match": query},  # Filter based on the cursor
            {"$sort": {"_id": sort_order}},  # Sort documents by `_id`
            {"$limit": page_size + 1},  # Limit results to page_size + 1 to check if there's a next page
            # {"$project": {"_id": 1, "title": 1, "content": 1}}  # In case you want to return only certain attributes
        ]

        # Execute the aggregation pipeline
        results = list(self.cursor_db.aggregate(pipeline))

        # Validate if some data was found
        if not results: raise ValueError("No data found")

        # Check if there are more documents than the page size
        if len(results) > page_size:
            # Deleting extra document
            results.pop(-1)
            # Reverse the results to maintain the correct order
            results.reverse()
            # Set the cursor for the previous page
            prev_cursor = results[0]['_id']
            # Set the cursor for the next page
            next_cursor = results[-1]['_id']
            # Indicate you are not at the start of the data
            at_start = False
        else:
            # Reverse the results to maintain the correct order
            results.reverse()
            # Indicate that there are not more previous pages available (initial page reached)
            prev_cursor = None
            # if prev_at_start:
            #     # in case before was at the starting page
            #     logger.warning("Caso 1")
            #     next_cursor = results[0]['_id']
            # else:
            #     # in case before was not at the starting page
            #     logger.warning("Caso 2")
            #     next_cursor = results[-1]['_id']
            next_cursor = results[-1]['_id']
            # Indicate you have reached the start of the data
            at_start = True
        return results, next_cursor, prev_cursor, at_start

    def start_pagination(self):
        """Inicia la navegacion de datos."""
        # Change page size in case you want it, only leave it here for reference
        page_size = None
        # Retrieve the first page of results
        results, next_cursor, prev_cursor, at_end = self._fetch_next_page(None, page_size)
        at_start = True
        logger.info(f"{results = }")
        logger.info(f"{next_cursor = }")
        logger.info(f"{prev_cursor = }")
        logger.info(f"{at_start = }")
        logger.info(f"{at_end = }")
        # if next_cursor:
        if not(at_start and at_end):
            while(True):
                print(125 * "*")
                if at_end:
                    inn = input("Can only move Backward (b) or Cancel (c): ")
                    stage = 0
                    # =====================================================
                    # You could reset at_end here, but in this example that
                    # will fail in case the user sends something different
                    # from Backward (b) or Cancel (c)
                    # =====================================================
                    # at_end = False
                elif at_start:
                    inn = input("Can only move Forward (f) or Cancel (c): ")
                    stage = 1
                    # =====================================================
                    # You could reset at_end here, but in this example that
                    # will fail in case the user sends something different
                    # from Forward (f) or Cancel (c)
                    # =====================================================
                    # at_start = False
                else:
                    inn = input("Can move Forward (f), Backward (b), or Cancel (c): ")
                    stage = 2

                # Execute action acording to the input
                if inn == "f" and stage in [1, 2]:
                    results, next_cursor, prev_cursor, at_end = self._fetch_next_page(next_cursor, page_size)
                    # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                    at_start = False
                elif inn == "b" and stage in [0, 2]:
                    # results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, at_start, page_size)
                    results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, page_size)
                    # For this example, you must reset here the value, otherwise you lose the reference of the cursor
                    at_end = False
                elif inn == "c":
                    logger.warning("------- Canceling execution -------")
                    break
                else:
                    print("Not valid action, it can only move in the opposite direction.")
                    continue
                logger.info(f"{results = }")
                logger.info(f"{next_cursor = }")
                logger.info(f"{prev_cursor = }")
                logger.info(f"{at_start = }")
                logger.info(f"{at_end = }")
        else:
            logger.warning("There is not more data to show")

@logger.catch
def main():
    """Main function."""
    my_cursor = cursorPattern(page_size=5)
    # my_cursor.add_data()
    my_cursor.start_pagination()

if __name__:
    main()
    logger.info("--- Execution end ---")

The page_size was added as an attribute to the class cursorPattern for it to be easier to define the size of every page and added docstrings to the class and its methods.

Hope this will help/guide someone that needs to implement Cursor Pagination.

以上是遊標分頁範例的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn