Home  >  Article  >  php教程  >  Thoughts triggered by a distinct question

Thoughts triggered by a distinct question

高洛峰
高洛峰Original
2016-12-13 09:45:311703browse

I was asked such a question today, and I compiled it for everyone’s reference

Suppose there is a table like the following:

Thoughts triggered by a distinct question

The data here has the following characteristics: In a DepartmentId, there may be many Name, and vice versa. That is to say, Name and DepartmentId have a many-to-many relationship.

Now I want to implement such a query: after sorting by DepartmentID (the first step), then obtain the unique values ​​of the Name column (the second step), and retain the relative order after the first step. In this example, the three values ​​that should be returned are: ACB

We first think of the following way of writing

select distinct name from Sample order by DepartmentId

Semantically speaking, this is very natural. But unfortunately, this statement cannot be executed at all. The error message is:

Thoughts triggered by a distinct question

This error means that if DISTINCT (removal of duplicate values) is used, the fields that appear after OrderBy must also appear after SELECT. But if DepartmentID really appears after SELECT, there will obviously be no duplicate values, so the result will definitely be wrong.

select distinct name,DepartmentId from Sample order by DepartmentId

Thoughts triggered by a distinct question

So, since the combination of DISINCT and OrderBy will cause this problem, is it possible for us to work around it, for example as follows:

SELECT distinct a .NameFROM (select top 100 percent name from Sample order by DepartmentId) a

To compare the previous writing methods, we used subquery technology. Also from a semantic point of view, still hot is very intuitive and clear. I want to sort by DepartmentId first and then go for duplicate values. But the result returned is as follows:

Thoughts triggered by a distinct question

Although duplicate values ​​are indeed removed, the order returned is wrong. We hope to sort by DepartmentId first, then remove duplicate values ​​and retain the relative order after sorting.

Why does the above result appear? In fact, it is because DISTINCT itself does sorting, and this behavior cannot be changed (this can be seen in the execution plan below). So in fact, the Order by we did before will lose its meaning here. [In fact, if you observe a similar query generated in ORM tools such as ADO.NET Entity Framework, it will automatically discard the Order by setting]

Thoughts triggered by a distinct question

So, in this case, is it impossible to achieve the requirements? What's up? Although this requirement is rare, most of the time, DISTINCT is the last operation and it is reasonable to perform a sort.

I thought about this, since the behavior of DISTINCT is built-in, is it possible to bypass this operation? In the end, a solution I used was: Can I assign a number to each Name? For example, if there are two A's, I would number the first A as 1, the second as 2, and so on. . Then, when querying, I first sort, and then filter those Names numbered 1, so that duplicate values ​​are actually removed.

SQL Server 2005 began to provide a ROW_NUMBER function. Combined with this function, I implemented the following query:

select a.Name from (select top 100 percentName,DepartmentId,ROW_NUMBER() over(partition by name order by departmentid) rowfrom Sample order by DepartmentId) awhere a.row=1order by a.DepartmentId


Then, I got the following results. After careful consideration, this should meet the requirement mentioned before.

Thoughts triggered by a distinct question

In comparison, the efficiency of this query will be lower, which is foreseeable (you can see some clues from the picture below). But if the requirements are rigid, it is not surprising to sacrifice some performance. Of course, we can study further to see if there are better ways to write it. Regardless, implementations using built-in standards are usually relatively fast.

Thoughts triggered by a distinct question

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Oracle-distinct usageNext article:Oracle-distinct usage