I'm trying to write a MySQL query to count rows that are significantly different from the most common row for each product and report the percentage difference, where below 100% the price is lower than the average price for that product, and above 100% The price is higher. Prices less than 1 standard deviation from the mean are ignored.
Sample data:
_rowid | _Timestamp | Code | fk_product_id | fk_po_id | cost |
---|---|---|---|---|---|
5952 | 2021-01-10 10:19:01 | 00805 | 1367 | 543 | 0.850 |
9403 | 2022-05-23 14:54:34 | 00805 | 1367 | 2942 | 0.850 |
41595 | 2022-11-23 11:20:26 | 00805 | 1367 | 3391 | 1.350 |
39635 | 2022-01-18 12:49:32 | Water1 | 344 | 3153 | 0.140 |
40134 | 2022-04-06 22:39:34 | Water1 | 344 | 2747 | 0.190 |
41676 | 2022-12-09 16:28:28 | Water1 | 344 | 3398 | 0.140 |
39634 | 2022-01-18 12:49:31 | gr309203 344400 | 1024 | 3154 | 0.770 |
35634 | 2021-03-03 15:23:23 | gr309203 344400 | 1024 | 3203 | 0.790 |
41264 | 2022-11-16 11:41:44 | gr309203 344400 | 1024 | 3357 | 0.970 |
SELECT code, fk_product_id, cost, cost/ (SELECT avg(cost) FROM po_line aa WHERE aa.code = code) AS percent FROM po_line WHERE (SELECT STDDEV(cost) FROM po_line ss WHERE ss.code = code)>1;
This will not return any rows, but three rows (one for each product) should appear in the report.
The expected result should be:
Code | fk_product_id | cost | percentage |
---|---|---|---|
00805 | 1367 | 1.350 | 133 |
Water1 | 344 | 0.190 | 121 |
gr309203 344400 | 1024 | 0.970 | 115 |
P粉1419112442023-09-14 20:44:14
This query shows how to use window functions to calculate the number of standard deviations and the percentage of cost relative to the average cost for each given code.
SELECT code, cost, avg, std, ABS(cost-avg)/std AS num_std, cost*100/avg AS pct FROM ( SELECT code, cost, AVG(cost) OVER (PARTITION BY code) AS avg, STDDEV(cost) OVER (PARTITION BY code) AS std FROM po_line ) AS p;
result:
+-----------------+-------+-----------+---------------------+--------------------+-------------+ | code | cost | avg | std | num_std | pct | +-----------------+-------+-----------+---------------------+--------------------+-------------+ | 00805 | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 | 83.6065546 | | 00805 | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 | 83.6065546 | | 00805 | 1.350 | 1.0166667 | 0.23570226039551592 | 1.4142134209517383 | 132.7868809 | | gr309203 344400 | 0.770 | 0.8433333 | 0.0899382504215469 | 0.8153738777025533 | 91.3043514 | | gr309203 344400 | 0.790 | 0.8433333 | 0.0899382504215469 | 0.5929990827042229 | 93.6758930 | | gr309203 344400 | 0.970 | 0.8433333 | 0.0899382504215469 | 1.4083740722807512 | 115.0197674 | | Water1 | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 | 89.3616831 | | Water1 | 0.190 | 0.1566667 | 0.02357022603955158 | 1.4142121481595331 | 121.2765699 | | Water1 | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 | 89.3616831 | +-----------------+-------+-----------+---------------------+--------------------+-------------+
(Please note that window functions require MySQL 8.0).
This query only shows you how the calculation is done. To get the results you want:
SELECT code, fk_product_id, cost, ROUND(cost*100/avg) AS pct FROM ( SELECT code, cost, fk_product_id, AVG(cost) OVER (PARTITION BY code) AS avg, STDDEV(cost) OVER (PARTITION BY code) AS std FROM po_line ) AS p WHERE ABS(cost-avg)/std > 1; +-----------------+---------------+-------+------+ | code | fk_product_id | cost | pct | +-----------------+---------------+-------+------+ | 00805 | 1367 | 1.350 | 133 | | gr309203 344400 | 1024 | 0.970 | 115 | | Water1 | 344 | 0.190 | 121 | +-----------------+---------------+-------+------+