I have some scraped product data in the database and I want to use it on my website. I want to write a query that returns all items with the title "%EXAMPLE%" but only unique products.
The problem is that I have multiple rows for 1 item, and I only want to return 1 row per product (I'm scraping every day, so I get an extra row per item every day). The only difference between the rows is that they have another date and price, since that's the price history I'm looking for.
Example: We have 3 items: pink chocolate, pink apples and pink pears. There are 3 rows for each item because I scratched 3 times. For example (for the purpose of this example, I didn't add all the other columns):
Product ID | title | price | Available |
---|---|---|---|
ABC123DEF | PINK APPLE | 0.47 | 1 |
ABC123DEF | PINK APPLE | 0.42 | 1 |
ABC123DEF | PINK APPLE | 0.41 | 1 |
ABC333FHG | PINK PEAR | 0.41 | 1 |
ABC333FHG | PINK PEAR | 0.41 | 1 |
ABC333FHG | PINK PEAR | 0.41 | 1 |
FH5845FJG | pink chocolate | 0.41 | 1 |
FH5845FJG | pink chocolate | 0.41 | 1 |
FH5845FJG | pink chocolate | 0.41 | 1 |
The result I want to get is:
Product ID | title | price | Available |
---|---|---|---|
ABC123DEF | PINK APPLE | 0.47 | 1 |
ABC333FHG | PINK PEAR | 0.41 | 1 |
FH5845FJG | pink chocolate | 0.41 | 1 |
It seems I have to search for the title and then filter out the duplicate productIds in order to get the correct results. But I don't know how to do it.
Any ideas?
P粉2983052662024-04-03 00:53:42
one example:
WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY productId ORDER BY price DESC) rn FROM test ) DELETE test FROM test NATURAL JOIN cte WHERE cte.rn > 1;
The query saves the highest price row for each product ID and deletes the other rows for that product.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=40df8e8e4b3eb206e0f73b7ce3a70a a5一个>
NOTE - Every complete row that must be stored is unique (rows that must be deleted may have complete duplicates).
WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY productId ORDER BY price DESC) rn FROM test ) SELECT * FROM cte WHERE rn = 1;
This query does not require the rows to be unique, in which case only a single copy of the row is returned. If you need all copies, use RANK() or DENSE_RANK() instead of ROW_NUMBER().
Solution for MySQL 5.x version.
SELECT * FROM test WHERE NOT EXISTS ( SELECT NULL FROM test t WHERE test.productId = t.productId AND test.price < t.price );
This query will return all replicas if they exist. If you only need one copy, add DISTINCT.