search

Home  >  Q&A  >  body text

Remove duplicate rows when doing self join in SQL

I am trying to remove duplicate values ​​from rows based on a join on the same table, but cannot remove a value in a column in table1 that is mapped to the same column in table2.

surface: Note: You are not limited to only one date in the table, there can be multiple dates with multiple sid

date    | sid | comp | disc
-----------------------
23 june | 1  | az  | 20
23 june | 1  | ph  | 22
23 june | 1  | mg  | 10
23 june | 2  | mg  | 8
23 june | 3  | ph  | 15
23 june | 3  | az  | 11
------------------------

About self-joining

select t1.*, t2.comp as comp1, t2.disc as disc1
from table as t1
left join table as t2 on t1.date = t2.date and t1.sid = t2.sid and t1.comp <> t2.comp
Output from above query:

date    | sid | comp | disc | comp1 | disc1
-------------------------------------------
23 june | 1  | az  | 20     | ph    | 22
23 june | 1  | az  | 20     | mg    | 10
23 june | 1  | ph  | 22     | az    | 20
23 june | 1  | ph  | 20     | mg    | 10
23 june | 1  | mg  | 10     | mg    | 10
23 june | 2  | mg  | 10     | null  | null
23 june | 3  | ph  | 10     | az    | 11
23 june | 3  | az  | 11     | ph    | 10

Expected results: (Here I try to get a combination of comp and comp1 where for each sid and date there are only different comp values ​​mapped with mg and null in comp1 if the sid has only mg or non-mg values) NOTE: There will be no duplicate rows when comp is mapped to comp1

date    | sid | comp | disc | comp1 | disc1
-------------------------------------------
23 june | 1  | az  | 20     | mg    | 10
23 june | 1  | ph  | 20     | mg    | 10
23 june | 2  | mg  | 10     | null  | null
23 june | 3  | ph  | 10     | null  | null
23 june | 3  | az  | 11     | null  | null

P粉203792468P粉203792468346 days ago515

reply all(1)I'll reply

  • P粉384244473

    P粉3842444732024-02-04 20:32:15

    View db<>fiddle

    WITH cte AS (
    SELECT t1.*, t2.comp as comp1, t2.disc as disc1, SUM(1) OVER(PARTITION BY date,sid,comp) AS cnt
    FROM `table` t1
    LEFT JOIN `table` t2 ON t1.date = t2.date AND t1.sid = t2.sid AND t1.comp <> t2.comp 
    )
    
    SELECT 
      date, sid, comp, disc, 
      CASE WHEN comp1 <> 'mg' THEN NULL ELSE comp1 END AS comp1, 
      CASE WHEN comp1 <> 'mg' THEN NULL ELSE disc1 END AS disc1
    FROM cte
    WHERE
     (CASE WHEN comp <> 'mg' OR comp1 IS NULL THEN cnt END) = 1
     OR 
     (CASE WHEN comp <> 'mg' AND comp1 = 'mg' THEN cnt END) >= 2

    reply
    0
  • Cancelreply