Home  >  Q&A  >  body text

SQL query to select only rows with maximum column value

<p>I have a document table (here is a simplified version): </p> <table class="s-table"> <thead> <tr> <th>id</th> <th>rev</th> <th>content</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>1</td> <td>...</td> </tr> <tr> <td>2</td> <td>1</td> <td>...</td> </tr> <tr> <td>1</td> <td>2</td> <td>...</td> </tr> <tr> <td>1</td> <td>3</td> <td>...</td> </tr> </tbody> </table> <p>How to select one row per id and only select the largest rev? </p><p> Based on the above data, the result should contain two rows: <code>[1, 3, ...]</code> and <code>[2, ​​1, ..]</code>. I'm using <strong><em>MySQL</em></strong>. </p> <p>Currently, I'm using a check in a <code>while</code> loop to detect and overwrite old revs in the result set. But is this the only way to achieve results? Isn't there a solution for <strong>SQL</strong>? </p>
P粉662614213P粉662614213425 days ago367

reply all(2)I'll reply

  • P粉909476457

    P粉9094764572023-08-22 12:49:52

    I prefer to use as little code as possible...

    You can use IN to achieve Try this:

    SELECT * 
    FROM t1 WHERE (id,rev) IN 
    ( SELECT id, MAX(rev)
      FROM t1
      GROUP BY id
    )

    In my opinion, this is simpler... easier to read and maintain.

    reply
    0
  • P粉287345251

    P粉2873452512023-08-22 09:59:35

    At first glance...

    You just need to use the GROUP BY clause with the MAX aggregate function:

    SELECT id, MAX(rev)
    FROM YourTable
    GROUP BY id

    Things are never that simple, right?

    I just noticed that you also need the content column.

    This is a very common problem in SQL: find the entire row of data with the maximum value in a column based on a certain grouping identifier. I've heard this question a lot in my career. In fact, this was one of the questions I answered during a technical interview at my current job.

    This question is actually so common that the Stack Overflow community created a tag specifically to deal with this type of question: .

    Basically, you have two ways to solve this problem:

    Use simple group-identifier, max-value-in-groupSubquery to connect

    In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a subquery. You then join your table with the subquery, using group-identifier and max-value-in-group for an equijoin:

    SELECT a.id, a.rev, a.contents
    FROM YourTable a
    INNER JOIN (
        SELECT id, MAX(rev) rev
        FROM YourTable
        GROUP BY id
    ) b ON a.id = b.id AND a.rev = b.rev

    Use self-connection and adjust connection conditions and filtering conditions

    In this approach, you left join the table to itself. Equijoining is performed on group-identifier. Then, there are two clever steps:

    1. The second connection condition is that the value on the left is less than the value on the right
    2. When you do step 1, the row that actually has the largest value will have NULL on the right (remember, this is a LEFT JOIN). We then filter the join results to only show rows with NULL on the right.

    So you end up with:

    SELECT a.*
    FROM YourTable a
    LEFT OUTER JOIN YourTable b
        ON a.id = b.id AND a.rev < b.rev
    WHERE b.id IS NULL;

    in conclusion

    The results obtained by these two methods are exactly the same.

    If there are two rows with max-value-in-group for group-identifier, then both methods will include both rows in the result.

    Both methods are SQL ANSI compatible, so no matter what "flavor" of RDBMS you prefer, you can use it.

    Both methods are also performance-friendly, but your actual situation may be different (RDBMS, database structure, index, etc.). So when you choose one of these methods, benchmark. And make sure to choose the method that makes the most sense for you.

    reply
    0
  • Cancelreply