search
HomeDatabaseMysql TutorialHow to query quantile value in MySQL

How to query quantile value in MySQL

May 27, 2023 pm 04:36 PM
mysql

Background

The concept of quantile value

In statistics and data analysis, quantiles (or quartiles) are often used to describe the statistical characteristics of data distribution. Generally, the quantile value is divided into four equal parts, namely the first quantile (Q1), the second quantile (Q2) (that is, the median), the third quantile (Q3) and the extreme Difference (IQR). Among them, 1/4 of the data is smaller than the first quantile, 1/4 of the data is larger than the third quantile, and the middle 50% of the data is between the first quantile and the third quantile. In statistics, the first quantile refers to the number in the top 25% of the entire sequence after a set of data is arranged in order of size; the second quantile refers to a set of data arranged in order of size. last, the number in the middle position; and the third quantile refers to the number in the bottom 25% of the entire sequence after a set of data is arranged in order of size. The median is the second quartile. In data analysis, quantile values ​​can help us understand the distribution of data and determine whether the data is biased to one side or how dispersed it is. When the data distribution is uneven, quantile values ​​can more accurately represent the differences in the data.

Business Background

The denomination distribution range of coupons issued by merchants is [1, 20], and each coupon will be marked with its corresponding denomination. To accurately control the cost of coupons, it is necessary to understand the issuance of coupons in real time in order to make a more accurate assessment. Through real-time monitoring of the amount of coupons issued, the average amount of coupons issued, and the quantile value of the amount issued (understanding the average amount of coupons issued in different intervals), you can have a clearer understanding of the issuance of coupons.

Currently, the business has sorted out the following indicators and needs data from students who need it. All indicators are based on minutes as the statistical granularity:

Issuance volume: Total amount of coupons issued

Amount of coupons issued Average: Total amount issued/Total amount issued

Coupon amount issued 0.1 percentile mean: The amount of coupons issued per minute is sorted by denomination, with larger denominations in front and smaller denominations later. Calculate the amount of coupons issued per minute. The average value of the top 10% of the coupons [for example, the order of coupon denominations is: 10, 9, 8, 8, 6, 5, 4, 4, 2, 2, then the average value of the 0.1 quantile is 10]

0.2 percentile mean of coupon amount issued: The amount of coupons issued per minute is sorted by denomination, with larger denominations in front and smaller denominations later. Calculate the top 20% of the coupon amount issued per minute. The average value of coupons [for example, the denomination order of issued coupons is: 10,9,8,8,6,5,4,4,2,2, then the average value of 0.2 percentile is (10 9)/2=9.5]

Indicators such as the issuance volume and the average amount of coupons can be implemented using MySQL. So how to use MySQL to query the quantile value?

Thinking

MySQL implements sorting

row_number() over ( partition by a1.min order by metric_value desc) as orderNum

metric_value represents the amount of coupons issued. Through the above function, it can be sorted according to the amount of coupons issued, and the coupon issuance data per minute is based on Amount sorting

MySQL implements topN

SELECT * FROM sales ORDER BY amount DESC LIMIT 10;

Obviously, this topN method cannot achieve sorting by minutes, and the top N% are taken. In order to know the amount of N%, we need to first determine the total amount, so we need to first calculate the total amount per minute. Then multiply it by N% to know how much data we need to extract N%.

select hour,min, count(1) as cn 
from table  
where dt=20230423 and hour=11 and min>=0 and min<=30 
group by hour,min

Then, we multiply the statistical results by N%

select dt,a2.hour,a2.min as min,metric_value, round(cn*N%) as cn, orderNum 
from ( 
	select dt,hour,a1.min as min, 
	metric_value, row_number() over ( partition by a1.min order by metric_value desc) as orderNum 
	from table a1 
	where dt=20230423 and hour=11 and min>=0 and min<=30 
	) as a2 
inner join ( 
	select hour,min , count(1) as cn 
	from table c 
	where dt=20230423 and hour=11 and min>=0 and min<=30  
	group by hour,min ) a3
on a2.hour=a3.hour and a2.min=a3.min

In this way, we can compare cn (the amount of data required to calculate the quantile value) and orderNum (the size of the current coupon according to the face value The size of the sort order) is used to obtain the first N% of the data, and then avg processing is performed on this part of the data to obtain the quantile value data.

Adjust the calculation logic and fuse it together to get the SQL of the percentile value as follows:

select dt,hour,min, round(avg(metric_value)) as metric_value 
from ( 
	select dt,a2.hour,a2.min as min,metric_value, round(cn*?) as cn, orderNum 
from ( 
	select dt,hour,a1.min as min,
	metric_value, row_number() over ( partition by a1.min order by metric_value desc) as orderNum 
	from table a1 
	where dt=20230423 and hour=11 and min>=0 and min<=30 
	) as a2 
inner join ( 
	select hour,min, count(1) as cn 
	from table a1 
	where dt=20230423 and hour=11 and min>=0 and min<=30 
	) as a3
on a2.hour=a3.hour and a2.min=a3.min ) as q 
where cn>orderNum 
group by dt,hour,min 
order by dt,hour,min

This data is within the range of calculating percentile value statistics if cn > orderNum.. In order to calculate the 0.1 percentile value, the first 10% of coupon issuance data per minute needs to be collected. After sorting by denomination and grouping by minutes, each record will be marked with the rank of the record. The total amount of coupons issued per minute is multiplied by 10% to get cnt. This value is the amount of data required to calculate the 0.1-minute average of this minute. When cnt

  • Explanation Before using MySQL to calculate the quantile value, the quantile value was always queried through the Java program for each minute's coupon issuance data, and then sorted to calculate the mean. accomplish. The biggest problem with program implementation is that if the amount of coupons issued is relatively large, then the quantile value indicators for a period of time need to be queried, which will put great pressure on the program. In fact, we do have this problem in our actual business. Every time you query 2 hours of quantile value data, over a million data will be loaded into the Java program, which is extremely scary for data query services. In order to solve this problem, we must implement the query of quantile values ​​through MySQL.

Effect

The program queries the detailed data to calculate the quantile value--> MySQL implements direct query of the quantile value

The performance starts from >1min --> Within 15s; performance is greatly improved

The above is the detailed content of How to query quantile value in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
MySQL's Role: Databases in Web ApplicationsMySQL's Role: Databases in Web ApplicationsApr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

MySQL: Building Your First DatabaseMySQL: Building Your First DatabaseApr 17, 2025 am 12:22 AM

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL: A Beginner-Friendly Approach to Data StorageMySQL: A Beginner-Friendly Approach to Data StorageApr 17, 2025 am 12:21 AM

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.

Is MySQL Beginner-Friendly? Assessing the Learning CurveIs MySQL Beginner-Friendly? Assessing the Learning CurveApr 17, 2025 am 12:19 AM

MySQL is suitable for beginners because: 1) easy to install and configure, 2) rich learning resources, 3) intuitive SQL syntax, 4) powerful tool support. Nevertheless, beginners need to overcome challenges such as database design, query optimization, security management, and data backup.

Is SQL a Programming Language? Clarifying the TerminologyIs SQL a Programming Language? Clarifying the TerminologyApr 17, 2025 am 12:17 AM

Yes,SQLisaprogramminglanguagespecializedfordatamanagement.1)It'sdeclarative,focusingonwhattoachieveratherthanhow.2)SQLisessentialforquerying,inserting,updating,anddeletingdatainrelationaldatabases.3)Whileuser-friendly,itrequiresoptimizationtoavoidper

Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageMySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsMySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function