While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node. Shard-Query can add parallelism to queries which use partitionedtables. Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.
The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.
In the examples I am going to use the schema from the Star Schema Benchmark. I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.
Here is the DDL for the lineorder table, which I will use for the demo queries:
CREATE TABLE IF NOT EXISTS lineorder( LO_OrderKey bigint not null, LO_LineNumber tinyint not null, LO_CustKey int not null, LO_PartKey int not null, LO_SuppKey int not null, LO_OrderDateKey int not null, LO_OrderPriority varchar(15), LO_ShipPriority char(1), LO_Quantity tinyint, LO_ExtendedPrice decimal, LO_OrdTotalPrice decimal, LO_Discount decimal, LO_Revenue decimal, LO_SupplyCost decimal, LO_Tax tinyint, LO_CommitDateKey int not null, LO_ShipMode varchar(10), primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)) PARTITION BY HASH(LO_OrderDateKey) PARTITIONS 8;
CREATETABLEIFNOTEXISTSlineorder (LO_OrderKeybigintnotnull, LO_LineNumbertinyintnotnull, LO_CustKeyintnotnull, LO_PartKeyintnotnull, LO_SuppKeyintnotnull, LO_OrderDateKeyintnotnull, LO_OrderPriorityvarchar(15), LO_ShipPrioritychar(1), LO_Quantitytinyint, LO_ExtendedPricedecimal, LO_OrdTotalPricedecimal, LO_Discountdecimal, LO_Revenuedecimal, LO_SupplyCostdecimal, LO_Taxtinyint,LO_CommitDateKeyintnotnull, LO_ShipModevarchar(10), primarykey(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber) )PARTITIONBYHASH(LO_OrderDateKey)PARTITIONS8; |
Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions. I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.
The SQL for the first demo is:
SELECT COUNT(DISTINCT LO_OrderDateKey) FROM lineorder;
SELECTCOUNT(DISTINCTLO_OrderDateKey)FROMlineorder; |
Here is the explain from regular MySQL:
mysql> explain select count(distinct LO_OrderDateKey) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: PRIMARYkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
mysql>explainselectcount(distinctLO_OrderDateKey)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:PRIMARY key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
So it is basically a full table scan. It takes a long time:
mysql> select count(distinct LO_OrderDateKey) from lineorder;+---------------------------------+| count(distinct LO_OrderDateKey) |+---------------------------------+|2406 |+---------------------------------+1 row in set (4 min 48.63 sec)
mysql>selectcount(distinctLO_OrderDateKey)fromlineorder; +---------------------------------+ |count(distinctLO_OrderDateKey)| +---------------------------------+ | 2406| +---------------------------------+ 1rowinset(4min48.63sec) |
Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:
Array([0] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p0)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[1] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p1)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[2] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p2)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[3] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p3)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[4] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p4)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[5] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p5)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[6] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p6)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[7] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey)
Array( [0]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p0) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [1]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p1) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [2]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p2) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [3]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p3) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [4]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p4) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [5]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p5) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [6]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p6) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [7]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ) |
You will notice that there is one query for each partition. Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.) The output of the queries go into a coordinator table, and then another query does a final aggregation. That query looks like this:
SELECT COUNT(distinct expr_2839651562) AS `count`FROM `aggregation_tmp_73522490`
SELECTCOUNT(distinctexpr_2839651562)AS`count` FROM`aggregation_tmp_73522490` |
The Shard-Query time:
select count(distinct LO_OrderDateKey) from lineorder;Array([count ] => 2406)1 rows returnedExec time: 0.10923719406128
selectcount(distinctLO_OrderDateKey)fromlineorder; Array([count]=>2406 )1rowsreturnedExectime:0.10923719406128 |
That isn’t a typo, it really issub-secondcompared tominutesin regular MySQL.
This is because Shard-Query usesGROUP BYto answer this query and a loose index scanof the PRIMARY KEY is possible:
mysql> explain partitions SELECT LO_OrderDateKey AS expr_2839651562-> FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey-> /G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder partitions: p7 type: rangepossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: NULL rows: 80108Extra: Using index for group-by1 row in set (0.00 sec)
mysql>explainpartitionsSELECTLO_OrderDateKeyASexpr_2839651562 ->FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ->/G***************************1.row*************************** id:1select_type:SIMPLE table:lineorder partitions:p7 type:range possible_keys:PRIMARY key:PRIMARY key_len:4 ref:NULL rows:80108 Extra:Usingindexforgroup-by 1rowinset(0.00sec) |
Next another simple query will be tested, first on regular MySQL:
mysql> select count(*) from lineorder;+----------+| count(*) |+----------+| 59986052 |+----------+1 row in set (4 min 8.70 sec)
mysql>selectcount(*)fromlineorder; +----------+|count(*)|+----------+|59986052|+----------+1rowinset(4min8.70sec) |
Again, the EXPLAIN shows a full table scan:
mysql> explain select count(*) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: NULLkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
mysql>explainselectcount(*)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:NULL key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:
[0] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 AND 1=1[1] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 AND 1=1[2] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 AND 1=1[3] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 AND 1=1...
[0]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p0)AS`lineorder`WHERE1=1AND1=1 [1]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p1)AS`lineorder`WHERE1=1AND1=1 [2]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p2)AS`lineorder`WHERE1=1AND1=1 [3]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p3)AS`lineorder`WHERE1=1AND1=1 ... |
The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:
SELECT SUM(expr_3190753946) AS ` count `FROM `aggregation_tmp_51969525`
SELECTSUM(expr_3190753946)AS`count` FROM`aggregation_tmp_51969525` |
And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:
Array([count ] => 59986052)1 rows returnedExec time: 140.24419403076
Array( [count]=>59986052 )1rowsreturnedExectime:140.24419403076 |
Finally, I want to look at a more complex query that uses joins and aggregation.
mysql> explain select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+|1 | SIMPLE| dim_date| ALL| PRIMARY | NULL| NULL| NULL |5 | Using temporary; Using filesort ||1 | SIMPLE| lineorder | ref| PRIMARY | PRIMARY | 4 | ssb.dim_date.D_DateKey | 89 | NULL||1 | SIMPLE| supplier| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_SuppKey |1 | Using where ||1 | SIMPLE| customer| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_CustKey |1 | Using where ||1 | SIMPLE| part| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_PartKey |1 | Using where |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+5 rows in set (0.01 sec)
mysql>explainselectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ | 1|SIMPLE |dim_date |ALL |PRIMARY |NULL |NULL |NULL | 5|Usingtemporary;Usingfilesort| | 1|SIMPLE |lineorder|ref |PRIMARY |PRIMARY|4 |ssb.dim_date.D_DateKey | 89|NULL | | 1|SIMPLE |supplier |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_SuppKey| 1|Usingwhere | | 1|SIMPLE |customer |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_CustKey| 1|Usingwhere | | 1|SIMPLE |part |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_PartKey| 1|Usingwhere | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ 5rowsinset(0.01sec) |
Here is the query on regular MySQL:
mysql> select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+--------+---------------+--------------+| d_year | c_nation| profit |+--------+---------------+--------------+| 1992 | ARGENTINA | 102741829748 |...| 1998 | UNITED STATES |61345891337 |+--------+---------------+--------------+35 rows in set (11 min 56.79 sec)
mysql>selectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +--------+---------------+--------------+ |d_year|c_nation |profit | +--------+---------------+--------------+ | 1992|ARGENTINA |102741829748| ...| 1998|UNITEDSTATES| 61345891337| +--------+---------------+--------------+ 35rowsinset(11min56.79sec) |
Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:
Array([d_year] => 1998[c_nation] => UNITED STATES[profit] => 61345891337)35 rows returnedExec time: 343.29854893684
Array( [d_year]=>1998 [c_nation]=>UNITEDSTATES [profit]=>61345891337 )35rowsreturnedExectime:343.29854893684 |
I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.
You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools
Please note: Configure and install Shard-Query as normal, but simply use one node and set thecolumnoption (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

mysqlviewshavelimitations:1)他们不使用Supportallsqloperations,限制DatamanipulationThroughViewSwithJoinSorsubqueries.2)他们canimpactperformance,尤其是withcomplexcomplexclexeriesorlargedatasets.3)

porthusermanagementInmysqliscialforenhancingsEcurityAndsingsmenting效率databaseoperation.1)usecReateusertoAddusers,指定connectionsourcewith@'localhost'or@'%'。

mysqldoes notimposeahardlimitontriggers,butacticalfactorsdeterminetheireffactective:1)serverConfiguration impactactStriggerGermanagement; 2)复杂的TriggerSincreaseSySystemsystem load; 3)largertablesslowtriggerperfermance; 4)highConconcConcrencerCancancancancanceTigrignecentign; 5); 5)

Yes,it'ssafetostoreBLOBdatainMySQL,butconsiderthesefactors:1)StorageSpace:BLOBscanconsumesignificantspace,potentiallyincreasingcostsandslowingperformance.2)Performance:LargerrowsizesduetoBLOBsmayslowdownqueries.3)BackupandRecovery:Theseprocessescanbe

通过PHP网页界面添加MySQL用户可以使用MySQLi扩展。步骤如下:1.连接MySQL数据库,使用MySQLi扩展。2.创建用户,使用CREATEUSER语句,并使用PASSWORD()函数加密密码。3.防止SQL注入,使用mysqli_real_escape_string()函数处理用户输入。4.为新用户分配权限,使用GRANT语句。

mysql'sblobissuitableForStoringBinaryDataWithInareLationalDatabase,而alenosqloptionslikemongodb,redis和calablesolutionsoluntionsoluntionsoluntionsolundortionsolunsolunsstructureddata.blobobobsimplobissimplobisslowderperformandperformanceperformancewithlararengelitiate;

toaddauserinmysql,使用:createUser'username'@'host'Indessify'password'; there'showtodoitsecurely:1)choosethehostcarecarefullytocon trolaccess.2)setResourcelimitswithoptionslikemax_queries_per_hour.3)usestrong,iniquepasswords.4)Enforcessl/tlsconnectionswith

toAvoidCommonMistakeswithStringDatatatPesInMysQl,CloseStringTypenuances,chosethirtightType,andManageEngencodingAndCollationsEttingsefectery.1)usecharforfixed lengengters lengengtings,varchar forbariaible lengength,varchariable length,andtext/blobforlabforlargerdata.2 seterters seterters seterters seterters


热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

MinGW - 适用于 Windows 的极简 GNU
这个项目正在迁移到osdn.net/projects/mingw的过程中,你可以继续在那里关注我们。MinGW:GNU编译器集合(GCC)的本地Windows移植版本,可自由分发的导入库和用于构建本地Windows应用程序的头文件;包括对MSVC运行时的扩展,以支持C99功能。MinGW的所有软件都可以在64位Windows平台上运行。

安全考试浏览器
Safe Exam Browser是一个安全的浏览器环境,用于安全地进行在线考试。该软件将任何计算机变成一个安全的工作站。它控制对任何实用工具的访问,并防止学生使用未经授权的资源。

DVWA
Damn Vulnerable Web App (DVWA) 是一个PHP/MySQL的Web应用程序,非常容易受到攻击。它的主要目标是成为安全专业人员在合法环境中测试自己的技能和工具的辅助工具,帮助Web开发人员更好地理解保护Web应用程序的过程,并帮助教师/学生在课堂环境中教授/学习Web应用程序安全。DVWA的目标是通过简单直接的界面练习一些最常见的Web漏洞,难度各不相同。请注意,该软件中

Dreamweaver Mac版
视觉化网页开发工具

EditPlus 中文破解版
体积小,语法高亮,不支持代码提示功能