search
HomeBackend DevelopmentPHP TutorialHow to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##Interview Questions & Real Experience

Interview question: How to achieve deep paging when the amount of data is large?

You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging?

At this time, students who have no practical experience are basically numb. So, please listen to me.

Painful Lessons

First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned.

Previous picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guess, if I click on page 142360, will the service explode?

Like MySQL, MongoDB database is okay. It is a professional database in itself. The processing is not good, and at most it is slow. But if it involves ES, the nature is different. We have to use SearchAfter Api to loop Obtaining data involves the issue of memory usage. If the code is not written elegantly, it may directly lead to memory overflow.

Why random depth page jumps cannot be allowed

Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?

MySQL

The basic principle of paging:

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;

LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.

MongoDB

The basic principle of paging:

db.t_data.find().limit(5).skip(5);

Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.

ElasticSearch

From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand.

Query process:

  • If you query page 501, with 10 items per page, the client sends a request to a certain node

  • This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.

  • The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.

  • Return to the client

From this we can see why it is necessary to limit the offset. In addition, if you use a scrolling method such as Search After API's deep page jump query also requires scrolling thousands of items each time. It may be necessary to scroll millions or tens of millions of pieces of data in total, just for the last 20 pieces of data. The efficiency can be imagined.

Align with the product again

As the saying goes, problems that cannot be solved by technology should be solved by business!

During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business:

Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayed

Modify the display method of page jumps, change it to scrolling display, or jump pages in a small range

Scrolling display reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Small-scale page jump reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##General solutionThe quick solution in a short period of time mainly includes the following points:

    Required: For sorting fields and filter conditions, the index must be set
  • Core: Use known data of small range page numbers, or known data of rolling loading, to reduce the offset
  • Extra: If you encounter a situation that is difficult to handle, You can also obtain excess data and intercept it to a certain extent, and the performance impact will not be significant
MySQL

Original paging SQL:

# 第一页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20;
# 第N页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;

Through context, rewritten as:

# XXXX 代表已知的数据
SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;

在 没内鬼,来点干货!SQL优化和诊断 一文中提到过,LIMIT会在满足条件下停止查询,因此该方案的扫描总量会急剧减少,效率提升Max!

ES

方案和MySQL相同,此时我们就可以随用所欲的使用 FROM-TO Api,而且不用考虑最大限制的问题。

MongoDB

方案基本类似,基本代码如下:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

相关性能测试:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

如果非要深度随机跳页

如果你没有杠过产品经理,又该怎么办呢,没关系,还有一丝丝的机会。

在 SQL优化 一文中还提到过MySQL深度分页的处理技巧,代码如下:

# 反例(耗时129.570s)
select * from task_result LIMIT 20000000, 10;
# 正例(耗时5.114s)
SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id;
# 说明
# task_result表为生产环境的一个表,总数据量为3400万,id为主键,偏移量达到2000万

该方案的核心逻辑即基于聚簇索引,在不通过回表的情况下,快速拿到指定偏移量数据的主键ID,然后利用聚簇索引进行回表查询,此时总量仅为10条,效率很高。

因此我们在处理MySQL,ES,MongoDB时,也可以采用一样的办法:

  • 限制获取的字段,只通过筛选条件,深度分页获取主键ID

  • 通过主键ID定向查询需要的数据

瑕疵:当偏移量非常大时,耗时较长,如文中的 5s

推荐教程:《MySQL教程

文章来源:https://juejin.im/post/5f0de4d06fb9a07e8a19a641

The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:juejin. If there is any infringement, please contact admin@php.cn delete
图文详解mysql架构原理图文详解mysql架构原理May 17, 2022 pm 05:54 PM

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于架构原理的相关内容,MySQL Server架构自顶向下大致可以分网络连接层、服务层、存储引擎层和系统文件层,下面一起来看一下,希望对大家有帮助。

mysql的msi与zip版本有什么区别mysql的msi与zip版本有什么区别May 16, 2022 pm 04:33 PM

mysql的msi与zip版本的区别:1、zip包含的安装程序是一种主动安装,而msi包含的是被installer所用的安装文件以提交请求的方式安装;2、zip是一种数据压缩和文档存储的文件格式,msi是微软格式的安装包。

mysql怎么去掉第一个字符mysql怎么去掉第一个字符May 19, 2022 am 10:21 AM

方法:1、利用right函数,语法为“update 表名 set 指定字段 = right(指定字段, length(指定字段)-1)...”;2、利用substring函数,语法为“select substring(指定字段,2)..”。

mysql怎么替换换行符mysql怎么替换换行符Apr 18, 2022 pm 03:14 PM

在mysql中,可以利用char()和REPLACE()函数来替换换行符;REPLACE()函数可以用新字符串替换列中的换行符,而换行符可使用“char(13)”来表示,语法为“replace(字段名,char(13),'新字符串') ”。

mysql怎么将varchar转换为int类型mysql怎么将varchar转换为int类型May 12, 2022 pm 04:51 PM

转换方法:1、利用cast函数,语法“select * from 表名 order by cast(字段名 as SIGNED)”;2、利用“select * from 表名 order by CONVERT(字段名,SIGNED)”语句。

MySQL复制技术之异步复制和半同步复制MySQL复制技术之异步复制和半同步复制Apr 25, 2022 pm 07:21 PM

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于MySQL复制技术的相关问题,包括了异步复制、半同步复制等等内容,下面一起来看一下,希望对大家有帮助。

mysql怎么判断是否是数字类型mysql怎么判断是否是数字类型May 16, 2022 am 10:09 AM

在mysql中,可以利用REGEXP运算符判断数据是否是数字类型,语法为“String REGEXP '[^0-9.]'”;该运算符是正则表达式的缩写,若数据字符中含有数字时,返回的结果是true,反之返回的结果是false。

带你把MySQL索引吃透了带你把MySQL索引吃透了Apr 22, 2022 am 11:48 AM

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了mysql高级篇的一些问题,包括了索引是什么、索引底层实现等等问题,下面一起来看一下,希望对大家有帮助。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.