Background
The concept of quantile value
In statistics and data analysis, quantiles (or quartiles) are often used to describe the statistical characteristics of data distribution. Generally, the quantile value is divided into four equal parts, namely the first quantile (Q1), the second quantile (Q2) (that is, the median), the third quantile (Q3) and the extreme Difference (IQR). Among them, 1/4 of the data is smaller than the first quantile, 1/4 of the data is larger than the third quantile, and the middle 50% of the data is between the first quantile and the third quantile. In statistics, the first quantile refers to the number in the top 25% of the entire sequence after a set of data is arranged in order of size; the second quantile refers to a set of data arranged in order of size. last, the number in the middle position; and the third quantile refers to the number in the bottom 25% of the entire sequence after a set of data is arranged in order of size. The median is the second quartile. In data analysis, quantile values can help us understand the distribution of data and determine whether the data is biased to one side or how dispersed it is. When the data distribution is uneven, quantile values can more accurately represent the differences in the data.
Business Background
The denomination distribution range of coupons issued by merchants is [1, 20], and each coupon will be marked with its corresponding denomination. To accurately control the cost of coupons, it is necessary to understand the issuance of coupons in real time in order to make a more accurate assessment. Through real-time monitoring of the amount of coupons issued, the average amount of coupons issued, and the quantile value of the amount issued (understanding the average amount of coupons issued in different intervals), you can have a clearer understanding of the issuance of coupons.
Currently, the business has sorted out the following indicators and needs data from students who need it. All indicators are based on minutes as the statistical granularity:
Issuance volume: Total amount of coupons issued
Amount of coupons issued Average: Total amount issued/Total amount issued
Coupon amount issued 0.1 percentile mean: The amount of coupons issued per minute is sorted by denomination, with larger denominations in front and smaller denominations later. Calculate the amount of coupons issued per minute. The average value of the top 10% of the coupons [for example, the order of coupon denominations is: 10, 9, 8, 8, 6, 5, 4, 4, 2, 2, then the average value of the 0.1 quantile is 10]
0.2 percentile mean of coupon amount issued: The amount of coupons issued per minute is sorted by denomination, with larger denominations in front and smaller denominations later. Calculate the top 20% of the coupon amount issued per minute. The average value of coupons [for example, the denomination order of issued coupons is: 10,9,8,8,6,5,4,4,2,2, then the average value of 0.2 percentile is (10 9)/2=9.5]
Indicators such as the issuance volume and the average amount of coupons can be implemented using MySQL. So how to use MySQL to query the quantile value?
Thinking
MySQL implements sorting
row_number() over ( partition by a1.min order by metric_value desc) as orderNum
metric_value represents the amount of coupons issued. Through the above function, it can be sorted according to the amount of coupons issued, and the coupon issuance data per minute is based on Amount sorting
MySQL implements topN
SELECT * FROM sales ORDER BY amount DESC LIMIT 10;
Obviously, this topN method cannot achieve sorting by minutes, and the top N% are taken. In order to know the amount of N%, we need to first determine the total amount, so we need to first calculate the total amount per minute. Then multiply it by N% to know how much data we need to extract N%.
select hour,min, count(1) as cn from table where dt=20230423 and hour=11 and min>=0 and min<=30 group by hour,min
Then, we multiply the statistical results by N%
select dt,a2.hour,a2.min as min,metric_value, round(cn*N%) as cn, orderNum from ( select dt,hour,a1.min as min, metric_value, row_number() over ( partition by a1.min order by metric_value desc) as orderNum from table a1 where dt=20230423 and hour=11 and min>=0 and min<=30 ) as a2 inner join ( select hour,min , count(1) as cn from table c where dt=20230423 and hour=11 and min>=0 and min<=30 group by hour,min ) a3 on a2.hour=a3.hour and a2.min=a3.min
In this way, we can compare cn (the amount of data required to calculate the quantile value) and orderNum (the size of the current coupon according to the face value The size of the sort order) is used to obtain the first N% of the data, and then avg processing is performed on this part of the data to obtain the quantile value data.
Adjust the calculation logic and fuse it together to get the SQL of the percentile value as follows:
select dt,hour,min, round(avg(metric_value)) as metric_value from ( select dt,a2.hour,a2.min as min,metric_value, round(cn*?) as cn, orderNum from ( select dt,hour,a1.min as min, metric_value, row_number() over ( partition by a1.min order by metric_value desc) as orderNum from table a1 where dt=20230423 and hour=11 and min>=0 and min<=30 ) as a2 inner join ( select hour,min, count(1) as cn from table a1 where dt=20230423 and hour=11 and min>=0 and min<=30 ) as a3 on a2.hour=a3.hour and a2.min=a3.min ) as q where cn>orderNum group by dt,hour,min order by dt,hour,min
This data is within the range of calculating percentile value statistics if cn > orderNum.. In order to calculate the 0.1 percentile value, the first 10% of coupon issuance data per minute needs to be collected. After sorting by denomination and grouping by minutes, each record will be marked with the rank of the record. The total amount of coupons issued per minute is multiplied by 10% to get cnt. This value is the amount of data required to calculate the 0.1-minute average of this minute. When cnt Explanation Before using MySQL to calculate the quantile value, the quantile value was always queried through the Java program for each minute's coupon issuance data, and then sorted to calculate the mean. accomplish. The biggest problem with program implementation is that if the amount of coupons issued is relatively large, then the quantile value indicators for a period of time need to be queried, which will put great pressure on the program. In fact, we do have this problem in our actual business. Every time you query 2 hours of quantile value data, over a million data will be loaded into the Java program, which is extremely scary for data query services. In order to solve this problem, we must implement the query of quantile values through MySQL. The program queries the detailed data to calculate the quantile value--> MySQL implements direct query of the quantile value The performance starts from >1min --> Within 15s; performance is greatly improvedEffect
The above is the detailed content of How to query quantile value in MySQL. For more information, please follow other related articles on the PHP Chinese website!

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.

MySQL is suitable for beginners because: 1) easy to install and configure, 2) rich learning resources, 3) intuitive SQL syntax, 4) powerful tool support. Nevertheless, beginners need to overcome challenges such as database design, query optimization, security management, and data backup.

Yes,SQLisaprogramminglanguagespecializedfordatamanagement.1)It'sdeclarative,focusingonwhattoachieveratherthanhow.2)SQLisessentialforquerying,inserting,updating,anddeletingdatainrelationaldatabases.3)Whileuser-friendly,itrequiresoptimizationtoavoidper

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function