search
HomeDatabaseMysql Tutorial浅谈SQL Server查询优化器中的JOIN算法

浅谈SQL Server查询优化器中的JOIN算法

Jun 07, 2016 pm 03:04 PM
joinserversqloptimizationInquirealgorithm

查询优化器 都是支持 JOIN 操作的,而 SQL Server 中主要有以下三类JOIN算法:Nested Loop、Sort-Merge以及Hash Join。尽管每种算法都并不是很复杂,但考虑到性能优化,在产品级的优化器实现时往往使用的是改进过的变种算法。譬如 SQL Server 支持block nest

  查询优化器都是支持JOIN操作的,而SQL Server 中主要有以下三类JOIN算法:Nested Loop、Sort-Merge以及Hash Join。尽管每种算法都并不是很复杂,但考虑到性能优化,在产品级的优化器实现时往往使用的是改进过的变种算法。譬如SQL Server 支持block nested loops、index nexted loops、sort-merge、hash join以及hash team。我们在这里只对上述三种基本算法的原型做一个简单的介绍。

  【假设】有两张表R和S,R共占有M页,S共占有N页。r 和 s 分别代表元组,而 i 和 j 分别代表第i或者第 j 个字段,也就是后文提到的JOIN字段。

  1. Nested Loop Join(嵌套循环联结)

  算法:

  其思路相当的简单和直接:对于关系R的每个元组 r 将其与关系S的每个元组 s 在JOIN条件的字段上直接比较并筛选出符合条件的元组。写成伪代码就是:

  foreach tuple r Î R do
  foreach tuple s Î S do
  if ri == sj then add to result

  代价:

  被联结的表所处内层或外层的顺序对磁盘I/O开销有着非常重要的影响。而CPU开销相对来说影响较小,主要是元组读入内存以后(in-memory)的开销,是 O (n * m)

  对于I/O开销,根据 page-at-a-time 的前提条件,I/O cost = M + M * N,翻译一下就是 I/O的开销 = 读取M页的I/O开销 + M次读取N页的I/O开销。

  使用小结:

  • 适用于一个集合大而另一个集合小的情况(将小集合做为外循环),I/O性能不错。

  • 当外循环输入相当小而内循环非常大且有索引建立在JOIN字段上时,I/O性能相当不错。

  • 当两个集合中只有一个在JOIN字段上建立索引时,一定要将该集合作为内循环。

  • 对于一对一的匹配关系(两个具有唯一约束字段的联结),可以在找到匹配元组后跳过该次内循环的剩余部分(类似于编程语言循环语句中的continue)。

  2. Sort-Merge Join (排序合并联结)

  Nested Loop一般在两个集合都很大的情况下效率就相当差了,而Sort-Merge在这种情况下就比它要高效不少,尤其是当两个集合的JOIN字段上都有聚集索引(clustered index)存在时,Sort-Merge性能将达到最好。

  算法:

  基本思路也很简单(复习一下数据结构中的合并排序吧),主要有两个步骤:

  (1) 按JOIN字段进行排序

  (2) 对两组已排序集合进行合并排序,从来源端各自取得数据列后加以比较(需要根据是否在JOIN字段有重复值做特殊的“分区”处理)

  代价:(主要是I/O开销)

  有两个因素左右Sort-Merge的开销:JOIN字段是否已排序 以及 JOIN字段上的重复值有多少。

  • 最好情况下(两列都已排序且至少有一列没有重复值):O (n + m) 只需要对两个集合各扫描一遍

  • 最差情况下(两列都未排序且两列上的所有值都相同):O (n * log n + m * log m + n * m) 两次排序以及一次全部元组间的笛卡尔乘积

使用小结:

  如前所述,可以考虑在两个结果集都很大情况下使用,最好能有聚集索引保证已经排序完毕。而在实际应用中,我们经常会与遇到的主键-外键关系就是Sort-Merge的一个很好的应用。这种情况下,一般两列都会有聚集索引(已排序)而且一对多的关系保证了至少有一列没有重复值,这种情况下,Sort-Merge的性能是三种里面最好的。

  另外,如果要求查询的SQL语法本身就要求GROUP BY、ORDER BY、CUBE等运行,则查询语法整体本来就要做排序,因此可以重用排序结果,此时Merge也是不错的选择。

  3. Hash Join (哈希联结)

  Hash Join在本质上类似于两列都有重复值时的Sort-Merge的处理思想――分区(patitioning)。但它们也有区别:Hash Join通过哈希来分区(每一个桶就是一个分区)而Sort-Merge通过排序来分区(每一个重复值就是一个分区)。

  值得注意的是,Hash Join与上述两种算法之间的较大区别同时也是一个较大限制是它只能应用于等值联结(equality join),这主要是由于哈希函数及其桶的确定性及无序性所导致的。

  算法:

  基本的Hash Join算法由以下两步组成:

  (1) Build Input Phase: 基于JOIN字段,使用哈希函数h2为较小的S集合构建内存中(in-memory)的哈希表,相同键值的以linked list组成一个桶(bucket)

  (2) Probe Input Phase: 在较大的R集合上对哈希表进行核对以完成联结。其中核对操作包括:

  foreach tuple r Î R do
  hash on the joining attribute using the hash function of step 1 to find a bucket in the hash table
  if the bucket is nonempty
  foreach tuple s in the found bucket
  if ri == sj then add to result

  代价:

  值得注意的是对于大集合R的每个元组 r ,hash bucket中对应 r 的那个bucket中的每个元组都需要与 r 进行比较,这也是算法最耗时的地方所在。

  CPU开销是O (m + n * b) b是每个bucket的平均元组数量。

  使用小结:

  一般来说,查询优化器会首先考虑Nested Loop和Sort-Merge,但如果两个集合量都不小且没有合适的索引时,才会考虑使用Hash Join。

  Hash Join也用于许多集合比较操作,inner join、left/right/full outer join、intersect、difference等等,当然了,需要保证都是等值联结。

  另外,Hash Join的变种能够移除重复和进行分组,它只使用一个输入,兼做Build和Probe的角色。

   其实产品级的优化器一般都改进了这些基本算法,而改进过的版本的确有较大的性能提升。在这里只是给需要判断执行计划优劣或者研究查询优化器实现的兄弟提供原理方面的介绍,在实际应用中我们还得结合丰富的statistics作出准确的判断。


Tech?Ed 2007 微软技术大会

点击查看 Tech?Ed 2007 微软技术大会 专题

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
MySQL's Role: Databases in Web ApplicationsMySQL's Role: Databases in Web ApplicationsApr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

MySQL: Building Your First DatabaseMySQL: Building Your First DatabaseApr 17, 2025 am 12:22 AM

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL: A Beginner-Friendly Approach to Data StorageMySQL: A Beginner-Friendly Approach to Data StorageApr 17, 2025 am 12:21 AM

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.

Is MySQL Beginner-Friendly? Assessing the Learning CurveIs MySQL Beginner-Friendly? Assessing the Learning CurveApr 17, 2025 am 12:19 AM

MySQL is suitable for beginners because: 1) easy to install and configure, 2) rich learning resources, 3) intuitive SQL syntax, 4) powerful tool support. Nevertheless, beginners need to overcome challenges such as database design, query optimization, security management, and data backup.

Is SQL a Programming Language? Clarifying the TerminologyIs SQL a Programming Language? Clarifying the TerminologyApr 17, 2025 am 12:17 AM

Yes,SQLisaprogramminglanguagespecializedfordatamanagement.1)It'sdeclarative,focusingonwhattoachieveratherthanhow.2)SQLisessentialforquerying,inserting,updating,anddeletingdatainrelationaldatabases.3)Whileuser-friendly,itrequiresoptimizationtoavoidper

Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageMySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsMySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version