찾다
데이터 베이스MySQL 튜토리얼Parallel Query for MySQL with Shard-Query_MySQL

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitionedtables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.

Here is the DDL for the lineorder table, which I will use for the demo queries:

CREATE TABLE IF NOT EXISTS lineorder( LO_OrderKey bigint not null, LO_LineNumber tinyint not null, LO_CustKey int not null, LO_PartKey int not null, LO_SuppKey int not null, LO_OrderDateKey int not null, LO_OrderPriority varchar(15), LO_ShipPriority char(1), LO_Quantity tinyint, LO_ExtendedPrice decimal, LO_OrdTotalPrice decimal, LO_Discount decimal, LO_Revenue decimal, LO_SupplyCost decimal, LO_Tax tinyint, LO_CommitDateKey int not null, LO_ShipMode varchar(10), primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)) PARTITION BY HASH(LO_OrderDateKey) PARTITIONS 8;

CREATETABLEIFNOTEXISTSlineorder

(

LO_OrderKeybigintnotnull,

LO_LineNumbertinyintnotnull,

LO_CustKeyintnotnull,

LO_PartKeyintnotnull,

LO_SuppKeyintnotnull,

LO_OrderDateKeyintnotnull,

LO_OrderPriorityvarchar(15),

LO_ShipPrioritychar(1),

LO_Quantitytinyint,

LO_ExtendedPricedecimal,

LO_OrdTotalPricedecimal,

LO_Discountdecimal,

LO_Revenuedecimal,

LO_SupplyCostdecimal,

LO_Taxtinyint,

LO_CommitDateKeyintnotnull,

LO_ShipModevarchar(10),

primarykey(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)

)PARTITIONBYHASH(LO_OrderDateKey)PARTITIONS8;

Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions.  I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.

The SQL for the first demo is:

SELECT COUNT(DISTINCT LO_OrderDateKey) FROM lineorder;

SELECTCOUNT(DISTINCTLO_OrderDateKey)FROMlineorder;

Here is the explain from regular MySQL:

mysql> explain select count(distinct LO_OrderDateKey) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: PRIMARYkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)

mysql>explainselectcount(distinctLO_OrderDateKey)fromlineorder/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

        type:index

possible_keys:PRIMARY

          key:PRIMARY

      key_len:25

          ref:NULL

        rows:58922188

        Extra:Usingindex

1rowinset(0.00sec)

So it is basically a full table scan. It takes a long time:

mysql> select count(distinct LO_OrderDateKey) from lineorder;+---------------------------------+| count(distinct LO_OrderDateKey) |+---------------------------------+|2406 |+---------------------------------+1 row in set (4 min 48.63 sec)

mysql>selectcount(distinctLO_OrderDateKey)fromlineorder;

+---------------------------------+

|count(distinctLO_OrderDateKey)|

+---------------------------------+

|                            2406|

+---------------------------------+

1rowinset(4min48.63sec)

Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:

Array([0] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p0)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[1] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p1)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[2] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p2)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[3] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p3)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[4] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p4)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[5] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p5)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[6] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p6)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[7] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey)
Array(

    [0]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p0)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [1]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p1)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [2]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p2)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [3]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p3)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [4]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p4)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [5]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p5)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [6]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p6)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [7]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p7)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

)

You will notice that there is one query for each partition.  Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.)  The output of the queries go into a coordinator table, and then another query does a final aggregation.  That query looks like this:

SELECT COUNT(distinct expr_2839651562) AS `count`FROM `aggregation_tmp_73522490`

SELECTCOUNT(distinctexpr_2839651562)AS`count`

FROM`aggregation_tmp_73522490`

The Shard-Query time:

select count(distinct LO_OrderDateKey) from lineorder;Array([count ] => 2406)1 rows returnedExec time: 0.10923719406128

selectcount(distinctLO_OrderDateKey)fromlineorder;

Array(

    [count]=>2406

)1rowsreturned

Exectime:0.10923719406128

That isn’t a typo, it really issub-secondcompared tominutesin regular MySQL.

This is because Shard-Query usesGROUP BYto answer this query and a loose index scanof the PRIMARY KEY is possible:

mysql> explain partitions SELECT LO_OrderDateKey AS expr_2839651562-> FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey-> /G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder partitions: p7 type: rangepossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: NULL rows: 80108Extra: Using index for group-by1 row in set (0.00 sec)

mysql>explainpartitionsSELECTLO_OrderDateKeyASexpr_2839651562

    ->FROMlineorder  PARTITION(p7)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    ->/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

  partitions:p7

        type:range

possible_keys:PRIMARY

          key:PRIMARY

      key_len:4

          ref:NULL

        rows:80108

        Extra:Usingindexforgroup-by

1rowinset(0.00sec)

Next another simple query will be tested, first on regular MySQL:

mysql> select count(*) from lineorder;+----------+| count(*) |+----------+| 59986052 |+----------+1 row in set (4 min 8.70 sec)

mysql>selectcount(*)fromlineorder;

+----------+|count(*)|+----------+|59986052|+----------+

1rowinset(4min8.70sec)

Again, the EXPLAIN shows a full table scan:

mysql> explain select count(*) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: NULLkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)

mysql>explainselectcount(*)fromlineorder/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

        type:index

possible_keys:NULL

          key:PRIMARY

      key_len:25

          ref:NULL

        rows:58922188

        Extra:Usingindex

1rowinset(0.00sec)

Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:

[0] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 AND 1=1[1] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 AND 1=1[2] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 AND 1=1[3] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 AND 1=1...

[0]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p0)AS`lineorder`WHERE1=1AND1=1

[1]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p1)AS`lineorder`WHERE1=1AND1=1

[2]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p2)AS`lineorder`WHERE1=1AND1=1

[3]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p3)AS`lineorder`WHERE1=1AND1=1

...

The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:

SELECT SUM(expr_3190753946) AS ` count `FROM `aggregation_tmp_51969525`

SELECTSUM(expr_3190753946)AS`count`

FROM`aggregation_tmp_51969525`

And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:

Array([count ] => 59986052)1 rows returnedExec time: 140.24419403076
Array(

[count]=>59986052

)1rowsreturned

Exectime:140.24419403076

Finally, I want to look at a more complex query that uses joins and aggregation.

mysql> explain select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+|1 | SIMPLE| dim_date| ALL| PRIMARY | NULL| NULL| NULL |5 | Using temporary; Using filesort ||1 | SIMPLE| lineorder | ref| PRIMARY | PRIMARY | 4 | ssb.dim_date.D_DateKey | 89 | NULL||1 | SIMPLE| supplier| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_SuppKey |1 | Using where ||1 | SIMPLE| customer| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_CustKey |1 | Using where ||1 | SIMPLE| part| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_PartKey |1 | Using where |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+5 rows in set (0.01 sec)

mysql>explainselectd_year,c_nation,  sum(lo_revenue-lo_supplycost)asprofit  fromlineorder  

joindim_date  onlo_orderdatekey=d_datekey  

joincustomer  onlo_custkey=c_customerkey  

joinsupplier  onlo_suppkey=s_suppkey  

joinpart  onlo_partkey=p_partkey  

where  c_region='AMERICA'  ands_region='AMERICA'  

and(p_mfgr='MFGR#1'  orp_mfgr='MFGR#2')  

groupbyd_year,c_nation  orderbyd_year,c_nation;

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

|id|select_type|table    |type  |possible_keys|key    |key_len|ref                      |rows|Extra                          |

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

|  1|SIMPLE      |dim_date  |ALL    |PRIMARY      |NULL    |NULL    |NULL                    |    5|Usingtemporary;Usingfilesort|

|  1|SIMPLE      |lineorder|ref    |PRIMARY      |PRIMARY|4      |ssb.dim_date.D_DateKey  |  89|NULL                            |

|  1|SIMPLE      |supplier  |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_SuppKey|    1|Usingwhere                    |

|  1|SIMPLE      |customer  |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_CustKey|    1|Usingwhere                    |

|  1|SIMPLE      |part      |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_PartKey|    1|Usingwhere                    |

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

5rowsinset(0.01sec)

Here is the query on regular MySQL:

mysql> select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+--------+---------------+--------------+| d_year | c_nation| profit |+--------+---------------+--------------+| 1992 | ARGENTINA | 102741829748 |...| 1998 | UNITED STATES |61345891337 |+--------+---------------+--------------+35 rows in set (11 min 56.79 sec)

mysql>selectd_year,c_nation,  sum(lo_revenue-lo_supplycost)asprofit  fromlineorder  joindim_date  onlo_orderdatekey=d_datekey  joincustomer  onlo_custkey=c_customerkey  joinsupplier  onlo_suppkey=s_suppkey  joinpart  onlo_partkey=p_partkey  where  c_region='AMERICA'  ands_region='AMERICA'  and(p_mfgr='MFGR#1'  orp_mfgr='MFGR#2')  groupbyd_year,c_nation  orderbyd_year,c_nation;

+--------+---------------+--------------+

|d_year|c_nation      |profit      |

+--------+---------------+--------------+

|  1992|ARGENTINA    |102741829748|

...

|  1998|UNITEDSTATES|  61345891337|

+--------+---------------+--------------+

35rowsinset(11min56.79sec)

Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:

Array([d_year] => 1998[c_nation] => UNITED STATES[profit] => 61345891337)35 rows returnedExec time: 343.29854893684
Array(

    [d_year]=>1998

    [c_nation]=>UNITEDSTATES

    [profit]=>61345891337

)35rowsreturned

Exectime:343.29854893684

I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.

You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools

Please note: Configure and install Shard-Query as normal, but simply use one node and set thecolumnoption (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.
MySQL에서 기존보기를 삭제하거나 수정하려면 어떻게해야합니까?MySQL에서 기존보기를 삭제하거나 수정하려면 어떻게해야합니까?May 16, 2025 am 12:11 AM

todropaViewInmysql, "dropviewifexistsview_name;"및 TomodifyAview를 사용하고 "createOrreplaceViewView_NameAsselect ...". "

MySQL보기 : 어떤 디자인 패턴을 사용할 수 있습니까?MySQL보기 : 어떤 디자인 패턴을 사용할 수 있습니까?May 16, 2025 am 12:10 AM

mysqlViewScaneFeficTicallyINGILIDESIGNPATTORNSLIKEADAPTER, DECIARATOR, FACTORY 및 OBSERVER.1) AdapterPatternAdAptSDataFromDifferentTablesinToAunifiedView.2) Decor

MySQL에서 뷰를 사용하면 어떤 장점이 있습니까?MySQL에서 뷰를 사용하면 어떤 장점이 있습니까?May 16, 2025 am 12:09 AM

viewsinmysqlarebeneficialforsimplifyingcomplexqueries, envancingsecurity, dataconsistency, andoptimizing promperformance

MySQL에서 간단한보기를 어떻게 만들 수 있습니까?MySQL에서 간단한보기를 어떻게 만들 수 있습니까?May 16, 2025 am 12:08 AM

toeteimpleviewinmysql, usethecreateviewstatement.1) definetheviewwithReateViewview_nameas.2) specifyTesLectStatementToreTrievesiredData.3) usetheViewLikeAtableForqueries.ViewsSimplifyDataAccessAndenHances, ButconSiderFormance

MySQL 사용자 명령문 생성 : 예제 및 공통 오류MySQL 사용자 명령문 생성 : 예제 및 공통 오류May 16, 2025 am 12:04 AM

toCreateUserSinmysql, usethecreateuserstatement.1) foralocaluser : createUser'LocalUser '@'localHost'IndifiedBy'SecurePassword '; 2) foremoteUser : createUser'RemoteUser'@'%'reidentifiedBy'StrongPassword ';

MySQL에서 뷰를 사용하는 한계는 무엇입니까?MySQL에서 뷰를 사용하는 한계는 무엇입니까?May 14, 2025 am 12:10 AM

mysqlviewshavelimitations : 1) 그들은 upportallsqloperations, datamanipulation throughviewswithjoinsorbqueries를 제한하지 않습니다

MySQL 데이터베이스 확보 : 사용자 추가 및 권한 부여MySQL 데이터베이스 확보 : 사용자 추가 및 권한 부여May 14, 2025 am 12:09 AM

적절한 usermanagementInmysqliscrucialforenhancingsecurityandensuringfefficientDatabaseOperation.1) USECREATEUSERTOWDDUSERS,@'localHost'or@'%'.

MySQL에서 사용할 수있는 트리거 수에 영향을 미치는 요인은 무엇입니까?MySQL에서 사용할 수있는 트리거 수에 영향을 미치는 요인은 무엇입니까?May 14, 2025 am 12:08 AM

mysqldoes notimposeahardlimitontriggers, butpracticalfactorsdeteirefectiveuse : 1) ServerConfigurationimpactStriggerManagement; 2) 복잡한 트리거 스케일 스케일 사이드로드; 3) argertableSlowtriggerTriggerPerformance; 4) High ConconcercencyCancaUspriggerContention; 5) m

See all articles

핫 AI 도구

Undresser.AI Undress

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

Clothoff.io

AI 옷 제거제

Video Face Swap

Video Face Swap

완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

뜨거운 도구

WebStorm Mac 버전

WebStorm Mac 버전

유용한 JavaScript 개발 도구

SublimeText3 Linux 새 버전

SublimeText3 Linux 새 버전

SublimeText3 Linux 최신 버전

MinGW - Windows용 미니멀리스트 GNU

MinGW - Windows용 미니멀리스트 GNU

이 프로젝트는 osdn.net/projects/mingw로 마이그레이션되는 중입니다. 계속해서 그곳에서 우리를 팔로우할 수 있습니다. MinGW: GCC(GNU Compiler Collection)의 기본 Windows 포트로, 기본 Windows 애플리케이션을 구축하기 위한 무료 배포 가능 가져오기 라이브러리 및 헤더 파일로 C99 기능을 지원하는 MSVC 런타임에 대한 확장이 포함되어 있습니다. 모든 MinGW 소프트웨어는 64비트 Windows 플랫폼에서 실행될 수 있습니다.

SublimeText3 중국어 버전

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

SublimeText3 Mac 버전

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)