用Twitter的cursor方式进行Web数据分页

Home

Database

Mysql Tutorial

用Twitter的cursor方式进行Web数据分页_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:49 PM

countcursortwittertechnology

bitsCN.com

　　本文讨论Web应用中实现数据分页功能，不同的技术实现方式的性能方区别。

　　上图功能的技术实现方法拿MySQL来举例就是

　　select * from msgs where thread_id = ? limit page * count, count

　　不过在看Twitter API的时候，我们却发现不少接口使用cursor的方法，而不用page, count这样直观的形式，如 followers ids 接口

　　URL:

　　http://twitter.com/followers/ids.format

　　Returns an array of numeric IDs for every user following the specified user.

　　Parameters:

　　* cursor. Required. Breaks the results into pages. Provide a value of -1 to begin paging. Provide values as returned to in the response body’s next_cursor and previous_cursor attributes to page back and forth in the list.

　　o Example: http://twitter.com/followers/ids/barackobama.xml?cursor=-1

　　o Example: http://twitter.com/followers/ids/barackobama.xml?cursor=-1300794057949944903

　　从上面描述可以看到，http://twitter.com/followers/ids.xml 这个调用需要传cursor参数来进行分页，而不是传统的 url?page=n&count=n的形式。这样做有什么优点呢?是否让每个cursor保持一个当时数据集的镜像?防止由于结果集实时改变而产生查询结果有重复内容?

　　在Google Groups这篇Cursor Expiration讨论中Twitter的架构师John Kalucki提到

　　A cursor is an opaque deletion-tolerant index into a Btree keyed by source

　　userid and modification time. It brings you to a point in time in the

　　reverse chron sorted list. So, since you can’t change the past, other than

　　erasing it, it’s effectively stable. (Modifications bubble to the top.) But

　　you have to deal with additions at the list head and also block shrinkage

　　due to deletions, so your blocks begin to overlap quite a bit as the data

　　ages. (If you cache cursors and read much later, you’ll see the first few

　　rows of cursor[n+1]’s block as duplicates of the last rows of cursor[n]’s

　　block. The intersection cardinality is equal to the number of deletions in

　　cursor[n]’s block). Still, there may be value in caching these cursors and

　　then heuristically rebalancing them when the overlap proportion crosses some

　　threshold.

　　在另外一篇new cursor-based pagination not multithread-friendly中John又提到

　　The page based approach does not scale with large sets. We can no

　　longer support this kind of API without throwing a painful number of

　　503s.

　　Working with row-counts forces the data store to recount rows in an O

　　(n^2) manner. Cursors avoid this issue by allowing practically

　　constant time access to the next block. The cost becomes O(n/

　　block_size) which, yes, is O(n), but a graceful one given n

　　a block_size of 5000. The cursor approach provides a more complete and

　　consistent result set.

　　Proportionally, very few users require multiple page fetches with a

　　page size of 5,000.

　　Also, scraping the social graph repeatedly at high speed is could

　　often be considered a low-value, borderline abusive use of the social

　　graph API.

　　通过这两段文字我们已经很清楚了，对于大结果集的数据，使用cursor方式的目的主要是为了极大地提高性能。还是拿MySQL为例说明，比如翻页到100,000条时，不用cursor，对应的SQL为

　　select * from msgs limit 100000, 100

　　在一个百万记录的表上，第一次执行这条SQL需要5秒以上。

　　假定我们使用表的主键的值作为cursor_id, 使用cursor分页方式对应的SQL可以优化为

　　select * from msgs where id > cursor_id limit 100;

　　同样的表中，通常只需要100ms以下, 效率会提高几十倍。MySQL limit性能差别也可参看我3年前写的一篇不成熟的文章 MySQL LIMIT 的性能问题。

　　结论

　　建议Web应用中大数据集翻页可以采用这种cursor方式，不过此方法缺点是翻页时必须连续，不能跳页。

bitsCN.com

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

MySQL's Place: Databases and ProgrammingApr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL: From Small Businesses to Large EnterprisesApr 13, 2025 am 12:17 AM

MySQL is suitable for small and large enterprises. 1) Small businesses can use MySQL for basic data management, such as storing customer information. 2) Large enterprises can use MySQL to process massive data and complex business logic to optimize query performance and transaction processing.

What are phantom reads and how does InnoDB prevent them (Next-Key Locking)?Apr 13, 2025 am 12:16 AM

InnoDB effectively prevents phantom reading through Next-KeyLocking mechanism. 1) Next-KeyLocking combines row lock and gap lock to lock records and their gaps to prevent new records from being inserted. 2) In practical applications, by optimizing query and adjusting isolation levels, lock competition can be reduced and concurrency performance can be improved.

MySQL: Not a Programming Language, But...Apr 13, 2025 am 12:03 AM

MySQL is not a programming language, but its query language SQL has the characteristics of a programming language: 1. SQL supports conditional judgment, loops and variable operations; 2. Through stored procedures, triggers and functions, users can perform complex logical operations in the database.

MySQL: An Introduction to the World's Most Popular DatabaseApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

The Importance of MySQL: Data Storage and ManagementApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system suitable for data storage, management, query and security. 1. It supports a variety of operating systems and is widely used in Web applications and other fields. 2. Through the client-server architecture and different storage engines, MySQL processes data efficiently. 3. Basic usage includes creating databases and tables, inserting, querying and updating data. 4. Advanced usage involves complex queries and stored procedures. 5. Common errors can be debugged through the EXPLAIN statement. 6. Performance optimization includes the rational use of indexes and optimized query statements.

Why Use MySQL? Benefits and AdvantagesApr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

Describe InnoDB locking mechanisms (shared locks, exclusive locks, intention locks, record locks, gap locks, next-key locks).Apr 12, 2025 am 12:16 AM

InnoDB's lock mechanisms include shared locks, exclusive locks, intention locks, record locks, gap locks and next key locks. 1. Shared lock allows transactions to read data without preventing other transactions from reading. 2. Exclusive lock prevents other transactions from reading and modifying data. 3. Intention lock optimizes lock efficiency. 4. Record lock lock index record. 5. Gap lock locks index recording gap. 6. The next key lock is a combination of record lock and gap lock to ensure data consistency.

See all articles