Understand the paging query after tens of billions of data are divided into tables-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Understand the paging query after tens of billions of data are divided into tables

coldplay.xixi

Nov 09, 2020 pm 05:24 PM

Paging query

mysql video tutorial column introduces paging query of tens of billions of data.

Understand the paging query after tens of billions of data are divided into tables

#When the business scale reaches a certain scale, Taobao’s daily order volume is more than 50 million orders, and Meituan’s daily order volume is more than 30 million orders. When the database is faced with massive data pressure, sub-database and table sub-operation are necessary. After the database is divided into tables, some regular queries may cause problems. The most common ones are paging queries. Generally, we call the fields of sharding tables as shardingkey. For example, the order table uses user ID as shardingkey. So how to do paging if the query condition does not include user ID? For example, how can more multi-dimensional queries be queried if there is no sharding key?

Unique Primary Key

Generally, the primary keys of our database are auto-incremented, so the problem of primary key conflict after splitting the table is an unavoidable problem. The simplest way is to use a unique business The field serves as the only primary key. For example, the order number of the order table must be globally unique.

There are many common distributed ways to generate unique IDs, the most common ones are Snowflake algorithm, Didi Tinyid, and Meituan Leaf. Taking the snowflake algorithm as an example, multiple IDs can be generated in one millisecond.

The first bit

is not used, the default is 0, 41 digit timestamp is accurate to milliseconds, can accommodate 69 years, 10 digits work The high 5 digits of the machine ID are the data center ID, the low 5 digits are the node ID, 12-digit serial number Each node accumulates every millisecond, and the total can reach 2^12 4096 IDs.

Partitioning

The first step is to ensure that the order number is unique after dividing the table. Now consider the issue of dividing the table. First, consider the size of the sub-table based on its own business volume and increment.

For example, our daily order volume is now 100,000 orders, and it is estimated that it will reach 1 million orders per day in one year. According to business attributes, we generally support querying orders within half a year, and orders that exceed half a year. Archiving is required.

So based on the order of 1 million orders per day for half a year, without separate tables, our order volume will reach 1 million , even if you can handle RT's time, you simply can't accept it. According to experience, there is no pressure on the database if the number of a single table is in the millions, so it is enough to divide it into 256 tables, 180 million/256 ≈ 700,000. If you are on the safe side, you can also divide it into 512 tables. Then think about it, if the business volume increases another 10 times to 10 million orders per day, sub-table 1024 is a more suitable choice.

After splitting tables and archiving data for more than half a year, 700,000 data in a single table is enough to cope with most scenarios. Next, hash the order number, and then take the modulo of 256 to determine which table it falls on.

Well, because the only primary key is based on the order number, the queries you wrote based on the primary key ID in the past cannot be used. This involves some history. Modifications to query functionality. But this is not a problem, right? Just change it to check by the order number. None of this is a problem, the problem is what our title says.

C-side query

After talking for a long time, I finally got to the point. So how to solve the problems of query and paging query after table partitioning?

First of all, let’s talk about the query with sharding key. For example, query by order number. No matter what you do, you can directly locate the specific table for query. Obviously, there will be no problem with the query.

If it is not the sharding key, if the order number is used as the sharding key in the example above, APPs and small programs are generally queried through the user ID. So what should we do with the sharding done through the order number? Many companies' order tables directly use the user ID as the sharding key, which is very simple and can be checked directly. So what to do with the order number? A very simple way is to add the user ID attribute to the order number. To give a very simple example, you think you can’t use up the original 41-digit timestamp. The user ID is 10 digits. The order number generation rule contains the user ID. When entering the specific table, the 10-digit user ID hash in the order number is used. Take the modulus so that the query effect is the same regardless of the order number or user ID.

Of course, this method is just an example. The specific order number generation rules, how many digits, and what factors are included are determined according to your own business and implementation mechanism.

Okay, then whether you use the order number or user ID as the sharding key, you can solve the problem by following the above two methods. Then there is another question: What should I do if it is neither an order number nor a user ID query? The most intuitive example is the query from the merchant side or the backend. The merchant side uses the ID of the merchant or seller as the query condition. The query conditions in the background may be more complicated, like some background query conditions I encountered. There could be dozens of them. How to check? ? ? Don't worry, let's talk about the complex queries on the B-side and the backend separately.

In reality, most of the real traffic comes from the user-side C-side, so it essentially solves the problem on the user-side. This problem is mostly solved, and the rest comes from the merchant-seller-side B-side and the backend support operation business. The query traffic is not very large, so this problem is easy to solve.

Other side query

There are two ways to solve the non-shardingkey query on the B side.

Double writing. Double writing means that the order data is stored in two copies. The C side and the B side each save one copy. For C side, you can use the order number and user ID as the sharding key. OK, the B-side can just use the merchant's seller's ID as the sharding key. Some classmates will say, will it not affect performance if you double-write? Because a slight delay is acceptable for the B-side, an asynchronous method can be used to place the B-side order. Think about it, if you go to Taobao to buy something and place an order, does it matter if the seller delays receiving the order message for a second or two? Does it have any big impact on the takeaway merchant you ordered to receive the order a second or two late?

This is a solution. Another solution is to use the offline data warehouse or ES query. After the order data is dropped into the database, whether you use binlog or MQ messages are all in the form of synchronizing data to a data warehouse or ES. The order of magnitude they support is very simple for this kind of query conditions. There is definitely a slight delay in this method, but this controllable delay is acceptable.

For queries on the management backend, such as operations, business, and products that need to look at data, they naturally require complex query conditions, which can also be done through ES or data warehouse. . If you don't use this solution and do paging query without shardingkey, brother, you can only scan the entire table to query the aggregated data, and then do paging manually, but the results found in this way are limited.

For example, if you have 256 shards, when querying, you scan all shards cyclically, fetch 20 pieces of data from each shard, and finally aggregate the data and manually page it, it will be impossible to find the full amount of data.

Summary

The query problem after database and table partitioning is actually known to experienced students, but I believe that most students may not have done the business yet. At this order of magnitude, sub-databases and tables may still be in the conceptual stage. After being asked about it in the interview, I feel at a loss because I don’t know what to do because I have no experience.

Sub-database and sub-table are first judged based on the existing business volume and future increment. For example, if Pinduoduo has a daily order volume of 50 million, the data in half a year must be in the tens of billions. That’s all The score is 4096 tables, right? But the actual operation is the same. For your business, it is not necessary to score 4096. Make a reasonable choice based on the business.

We can easily solve queries based on shardingkey. Querying on non-shardingkey can be solved by dropping double copies of data, data warehouse, and ES. Of course, if the amount of data after splitting is small, , it is not a problem to build the index and scan the entire table to query.

Related free learning recommendations: mysql video tutorial

The above is the detailed content of Understand the paging query after tens of billions of data are divided into tables. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:juejin. If there is any infringement, please contact admin@php.cn delete

如何使用PHP数据库连接实现分页查询Sep 08, 2023 pm 02:28 PM

如何使用PHP数据库连接实现分页查询在开发web应用中，常常会涉及到对数据库进行查询并进行分页显示的需求。PHP作为一种常用的服务端脚本语言，具备强大的数据库连接功能，可以很方便地实现分页查询。本文将详细介绍如何使用PHP数据库连接实现分页查询，并附上相应的代码示例。准备数据库在开始之前，我们需要准备一个数据库，包含要查询的数据。这里以MySQL数据库为例，

PHP和PDO: 如何进行分页查询和显示数据Jul 29, 2023 pm 04:10 PM

PHP和PDO:如何进行分页查询和显示数据在开发Web应用程序时，分页查询和显示数据是一个非常常见的需求。通过分页，我们可以一次显示一定数量的数据，提高页面加载速度和用户体验。在PHP中，使用PHP数据对象（PDO）库可以轻松实现分页查询和显示数据的功能。本文将介绍如何在PHP中使用PDO进行分页查询和显示数据，并提供相应的代码示例。一、创建数据库和数据表

如何使用Go语言创建高性能的MySQL数据分页查询Jun 17, 2023 am 09:09 AM

随着互联网的快速发展，数据处理已经成为了企业应用开发的一项重要技能。在许多应用程序中，MySQL数据库经常是最常用的数据存储之一。在MySQL中，数据分页查询是一项常见的数据检索操作。本文将介绍如何使用Go语言实现高性能的MySQL数据分页查询。一、什么是数据分页查询？数据分页查询是一种常用的数据检索技术，它可以让用户在一页上只浏览少量的数据而不必一次性加载

PHP和Oracle数据库的分页查询技巧Jul 11, 2023 pm 11:09 PM

PHP和Oracle数据库的分页查询技巧在开发一个动态网页时，如果需要展示大量数据，就需要进行分页查询。分页查询是一种将数据划分为更小的页面，以便用户能够轻松地浏览和导航的技术。在本文中，我们将讨论如何使用PHP和Oracle数据库实现分页查询，并提供相关的代码示例。一、准备工作在开始之前，我们需要确保我们已经安装和配置好了PHP和Oracle数据库。如果还

如何在ThinkPHP6中使用Mysql进行分页查询Jun 20, 2023 pm 02:01 PM

随着互联网的快速发展，Web应用程序的开发也越来越复杂。而分页查询是Web应用程序中常见的功能之一。ThinkPHP6是一个web框架，可以帮助开发人员快速开发应用程序。在本文中，我们将讨论如何在ThinkPHP6中使用MySQL进行分页查询。首先，我们需要在ThinkPHP6中创建数据库。在MySQL中创建数据库的语句如下：CREATEDATABASE

PHP如何使用MongoDB进行分页查询Jul 07, 2023 pm 09:28 PM

PHP如何使用MongoDB进行分页查询概述：MongoDB是一个非关系型数据库，常用于存储大量的文档数据，而PHP是一种流行的服务器端脚本语言。在本文中，将介绍如何使用PHP来连接MongoDB，并且实现分页查询的功能。步骤一：安装MongoDB扩展在PHP中与MongoDB进行交互需要安装MongoDB的扩展。可以通过以下命令来安装MongoDB扩展：p

MySql的分页查询：如何处理大数据库分页的性能问题Jun 15, 2023 pm 03:28 PM

在现代应用程序中，大部分的数据都需要分页显示。当应用程序需要处理大量数据的时候，这就会给服务器和数据库造成压力，导致查询耗时变长。Mysql是目前最受欢迎的关系型数据库之一，本文将讨论如何优化Mysql分页查询的性能问题。分页原理在开始优化之前，我们必须先了解分页原理。分页查询的主要原理是将表数据分成若干个分页，然后逐个分页地查询数据。例如，如果我们需要在一

PHP开发中如何处理大规模数据库查询Oct 08, 2023 am 09:22 AM

PHP开发中如何处理大规模数据库查询在PHP开发过程中，处理大规模数据库查询是一个常见的挑战。当应用程序需要查询的数据量大到无法一次性加载到内存中时，需要采取一些策略来提高查询效率，并且确保应用程序的性能不会受到过多的影响。下面将介绍几个处理大规模数据库查询的常用技巧和代码示例：分页查询当查询结果数量很大时，将结果分成若干页进行显示可以避免一次性加载过多数据

See all articles