


##Interview Questions & Real Experience
Interview question: How to achieve deep paging when the amount of data is large?You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging? At this time, students who have no practical experience are basically numb. So, please listen to me.
Painful Lessons
First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned. Previous picture:Why random depth page jumps cannot be allowed
Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?MySQL
The basic principle of paging:SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.
MongoDB
The basic principle of paging:db.t_data.find().limit(5).skip(5);Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.
ElasticSearch
From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand. Query process:- If you query page 501, with 10 items per page, the client sends a request to a certain node
- This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.
- The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.
- Return to the client
Align with the product again
As the saying goes, problems that cannot be solved by technology should be solved by business! During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business: Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayedModify the display method of page jumps, change it to scrolling display, or jump pages in a small rangeScrolling display reference picture:##General solutionThe quick solution in a short period of time mainly includes the following points:
- Required: For sorting fields and filter conditions, the index must be set
- Core: Use known data of small range page numbers, or known data of rolling loading, to reduce the offset
- Extra: If you encounter a situation that is difficult to handle, You can also obtain excess data and intercept it to a certain extent, and the performance impact will not be significant
Original paging SQL:
# 第一页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20; # 第N页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;
Through context, rewritten as:
# XXXX 代表已知的数据 SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;
在 没内鬼,来点干货!SQL优化和诊断 一文中提到过,LIMIT会在满足条件下停止查询,因此该方案的扫描总量会急剧减少,效率提升Max!
ES
方案和MySQL相同,此时我们就可以随用所欲的使用 FROM-TO Api,而且不用考虑最大限制的问题。
MongoDB
方案基本类似,基本代码如下:
相关性能测试:
如果非要深度随机跳页
如果你没有杠过产品经理,又该怎么办呢,没关系,还有一丝丝的机会。
在 SQL优化 一文中还提到过MySQL深度分页的处理技巧,代码如下:
# 反例(耗时129.570s) select * from task_result LIMIT 20000000, 10; # 正例(耗时5.114s) SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id; # 说明 # task_result表为生产环境的一个表,总数据量为3400万,id为主键,偏移量达到2000万
该方案的核心逻辑即基于聚簇索引,在不通过回表的情况下,快速拿到指定偏移量数据的主键ID,然后利用聚簇索引进行回表查询,此时总量仅为10条,效率很高。
因此我们在处理MySQL,ES,MongoDB时,也可以采用一样的办法:
限制获取的字段,只通过筛选条件,深度分页获取主键ID
通过主键ID定向查询需要的数据
瑕疵:当偏移量非常大时,耗时较长,如文中的 5s
推荐教程:《MySQL教程》
文章来源:https://juejin.im/post/5f0de4d06fb9a07e8a19a641
The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于架构原理的相关内容,MySQL Server架构自顶向下大致可以分网络连接层、服务层、存储引擎层和系统文件层,下面一起来看一下,希望对大家有帮助。

mysql的msi与zip版本的区别:1、zip包含的安装程序是一种主动安装,而msi包含的是被installer所用的安装文件以提交请求的方式安装;2、zip是一种数据压缩和文档存储的文件格式,msi是微软格式的安装包。

方法:1、利用right函数,语法为“update 表名 set 指定字段 = right(指定字段, length(指定字段)-1)...”;2、利用substring函数,语法为“select substring(指定字段,2)..”。

在mysql中,可以利用char()和REPLACE()函数来替换换行符;REPLACE()函数可以用新字符串替换列中的换行符,而换行符可使用“char(13)”来表示,语法为“replace(字段名,char(13),'新字符串') ”。

转换方法:1、利用cast函数,语法“select * from 表名 order by cast(字段名 as SIGNED)”;2、利用“select * from 表名 order by CONVERT(字段名,SIGNED)”语句。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于MySQL复制技术的相关问题,包括了异步复制、半同步复制等等内容,下面一起来看一下,希望对大家有帮助。

在mysql中,可以利用REGEXP运算符判断数据是否是数字类型,语法为“String REGEXP '[^0-9.]'”;该运算符是正则表达式的缩写,若数据字符中含有数字时,返回的结果是true,反之返回的结果是false。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了mysql高级篇的一些问题,包括了索引是什么、索引底层实现等等问题,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Chinese version
Chinese version, very easy to use

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
