搜索
首页数据库mysql教程Real-time Profiling a MongoDB Cluster

by A. Jesse Jiryu Davis, Python Evangelist at 10gen In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations

by A. Jesse Jiryu Davis, Python Evangelist at 10gen

In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.

Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.


Setup

You’ll need a recent install of MongoDB (I’m using 2.4.4), Python, a recent version of PyMongo (at least 2.4—I’m using 2.5.2) and the code in my cluster-profile repository on GitHub. If you install the Colorama Python package you’ll get cute colored output. These scripts were tested on my Mac.

Sharded cluster of replica sets

Run the cluster_setup.py script in my repository. It sets up a standard sharded cluster for you running on your local machine. There’s a mongos, three config servers, and two shards, each of which is a three-member replica set. The first shard’s replica set is running on ports 4000 through 4002, the second shard is on ports 5000 through 5002, and the three config servers are on ports 6000 through 6002:

The setup

For the finale, cluster_setup.py makes a collection named sharded_collection, sharded on a key named shard_key.

In a normal deployment, we’d let MongoDB’s balancer automatically distribute chunks of data among our two shards. But for this demo we want documents to be on predictable shards, so my script disables the balancer. It makes a chunk for all documents with shard_key less than 500 and another chunk for documents with shard_key greater than or equal to 500. It moves the high chunk to replset_1:

client = MongoClient()  # Connect to mongos.
admin = client.admin  # admin database.

Pre-split.

admin.command(
    'split', 'test.sharded_collection',
    middle={'shard_key': 500})
admin.command(
    'moveChunk', 'test.sharded_collection',
    find={'shard_key': 500},
    to='replset_1')

If you connect to mongos with the MongoDB shell, sh.status() shows there’s one chunk on each of the two shards:

{ "shard_key" : { "$minKey" : 1 } } -->> { "shard_key" : 500 } on : replset_0 { "t" : 2, "i" : 1 }
{ "shard_key" : 500 } -->> { "shard_key" : { "$maxKey" : 1 } } on : replset_1 { "t" : 2, "i" : 0 }

The setup script also inserts a document with a shard_key of 0 and another with a shard_key of 500. Now we’re ready for some profiling.

Profiling

Run the tail_profile.py script from my repository. It connects to all the replica set members. On each, it sets the profiling level to 2 (“log everything”) on the test database, and creates a tailable cursor on the system.profile collection. The script filters out some noise in the profile collection—for example, the activities of the tailable cursor show up in the system.profile collection that it’s tailing. Any legitimate entries in the profile are spat out to the console in pretty colors.

Experiments

Targeted queries versus scatter-gather

Let’s run a query from Python in a separate terminal:

>>> from pymongo import MongoClient
>>> # Connect to mongos.
>>> collection = MongoClient().test.sharded_collection
>>> collection.find_one({'shard_key': 0})
{'_id': ObjectId('51bb6f1cca1ce958c89b348a'), 'shard_key': 0}

tail_profile.py prints:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}

The query includes the shard key, so mongos reads from the shard that can satisfy it. Adding shards can scale out your throughput on a query like this. What about a query that doesn’t contain the shard key?:

>>> collection.find_one({})

mongos sends the query to both shards:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
replset_1 primary on 5000: query test.sharded_collection {“shard_key”: 500}

For fan-out queries like this, adding more shards won’t scale out your query throughput as well as it would for targeted queries, because every shard has to process every query. But we can scale throughput on queries like these by reading from secondaries.

Queries with read preferences

We can use read preferences to read from secondaries:

>>> from pymongo.read_preferences import ReadPreference
>>> collection.find_one({}, read_preference=ReadPreference.SECONDARY)

tail_profile.py shows us that mongos chose a random secondary from each shard:

replset_0 secondary on 4001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
replset_1 secondary on 5001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}

Note how PyMongo passes the read preference to mongos in the query, as the $readPreference field. mongos targets one secondary in each of the two replica sets.

Updates

With a sharded collection, updates must either include the shard key or be “multi-updates”. An update with the shard key goes to the proper shard, of course:

>>> collection.update({'shard_key': -100}, {'$set': {'field': 'value'}})

replset_0 primary on 4000: update test.sharded_collection {“shard_key”: -100}

mongos only sends the update to replset_0, because we put the chunk of documents with shard_key less than 500 there.

A multi-update hits all shards:

>>> collection.update({}, {'$set': {'field': 'value'}}, multi=True)

replset_0 primary on 4000: update test.sharded_collection {}
replset_1 primary on 5000: update test.sharded_collection {}

A multi-update on a range of the shard key need only involve the proper shard:

>>> collection.update({'shard_key': {'$gt': 1000}}, {'$set': {'field': 'value'}}, multi=True)

replset_1 primary on 5000: update test.sharded_collection {“shard_key”: {“$gt”: 1000}}

So targeted updates that include the shard key can be scaled out by adding shards. Even multi-updates can be scaled out if they include a range of the shard key, but multi-updates without the shard key won’t benefit from extra shards.

Commands

In version 2.4, mongos can use secondaries not only for queries, but also for some commands. You can run count on secondaries if you pass the right read preference:

>>> cursor = collection.find(read_preference=ReadPreference.SECONDARY)
>>> cursor.count()

replset_0 secondary on 4001: command count: sharded_collection
replset_1 secondary on 5001: command count: sharded_collection

Whereas findAndModify, since it modifies data, is run on the primaries no matter your read preference:

>>> db = MongoClient().test
>>> test.command(
...     'findAndModify',
...     'sharded_collection',
...     query={'shard_key': -1},
...     remove=True,
...     read_preference=ReadPreference.SECONDARY)

replset_0 primary on 4000: command findAndModify: sharded_collection

Go Forth And Scale

To scale a sharded cluster, you should understand how operations are distributed: are they scatter-gather, or targeted to one shard? Do they run on primaries or secondaries? If you set up a cluster and test your queries interactively like we did here, you can see how your cluster behaves in practice, and design your application for future growth.

Read Jesse’s blog, Emptysquare and follow him on Github

声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
说明InnoDB重做日志和撤消日志的作用。说明InnoDB重做日志和撤消日志的作用。Apr 15, 2025 am 12:16 AM

InnoDB使用redologs和undologs确保数据一致性和可靠性。1.redologs记录数据页修改,确保崩溃恢复和事务持久性。2.undologs记录数据原始值,支持事务回滚和MVCC。

在解释输出(类型,键,行,额外)中要查找的关键指标是什么?在解释输出(类型,键,行,额外)中要查找的关键指标是什么?Apr 15, 2025 am 12:15 AM

EXPLAIN命令的关键指标包括type、key、rows和Extra。1)type反映查询的访问类型,值越高效率越高,如const优于ALL。2)key显示使用的索引,NULL表示无索引。3)rows预估扫描行数,影响查询性能。4)Extra提供额外信息,如Usingfilesort提示需要优化。

在解释中使用临时状态以及如何避免它是什么?在解释中使用临时状态以及如何避免它是什么?Apr 15, 2025 am 12:14 AM

Usingtemporary在MySQL查询中表示需要创建临时表,常见于使用DISTINCT、GROUPBY或非索引列的ORDERBY。可以通过优化索引和重写查询避免其出现,提升查询性能。具体来说,Usingtemporary出现在EXPLAIN输出中时,意味着MySQL需要创建临时表来处理查询。这通常发生在以下情况:1)使用DISTINCT或GROUPBY时进行去重或分组;2)ORDERBY包含非索引列时进行排序;3)使用复杂的子查询或联接操作。优化方法包括:1)为ORDERBY和GROUPB

描述不同的SQL交易隔离级别(读取未读取,读取,可重复的读取,可序列化)及其在MySQL/InnoDB中的含义。描述不同的SQL交易隔离级别(读取未读取,读取,可重复的读取,可序列化)及其在MySQL/InnoDB中的含义。Apr 15, 2025 am 12:11 AM

MySQL/InnoDB支持四种事务隔离级别:ReadUncommitted、ReadCommitted、RepeatableRead和Serializable。1.ReadUncommitted允许读取未提交数据,可能导致脏读。2.ReadCommitted避免脏读,但可能发生不可重复读。3.RepeatableRead是默认级别,避免脏读和不可重复读,但可能发生幻读。4.Serializable避免所有并发问题,但降低并发性。选择合适的隔离级别需平衡数据一致性和性能需求。

MySQL与其他数据库:比较选项MySQL与其他数据库:比较选项Apr 15, 2025 am 12:08 AM

MySQL适合Web应用和内容管理系统,因其开源、高性能和易用性而受欢迎。1)与PostgreSQL相比,MySQL在简单查询和高并发读操作上表现更好。2)相较Oracle,MySQL因开源和低成本更受中小企业青睐。3)对比MicrosoftSQLServer,MySQL更适合跨平台应用。4)与MongoDB不同,MySQL更适用于结构化数据和事务处理。

MySQL索引基数如何影响查询性能?MySQL索引基数如何影响查询性能?Apr 14, 2025 am 12:18 AM

MySQL索引基数对查询性能有显着影响:1.高基数索引能更有效地缩小数据范围,提高查询效率;2.低基数索引可能导致全表扫描,降低查询性能;3.在联合索引中,应将高基数列放在前面以优化查询。

MySQL:新用户的资源和教程MySQL:新用户的资源和教程Apr 14, 2025 am 12:16 AM

MySQL学习路径包括基础知识、核心概念、使用示例和优化技巧。1)了解表、行、列、SQL查询等基础概念。2)学习MySQL的定义、工作原理和优势。3)掌握基本CRUD操作和高级用法,如索引和存储过程。4)熟悉常见错误调试和性能优化建议,如合理使用索引和优化查询。通过这些步骤,你将全面掌握MySQL的使用和优化。

现实世界Mysql:示例和用例现实世界Mysql:示例和用例Apr 14, 2025 am 12:15 AM

MySQL在现实世界的应用包括基础数据库设计和复杂查询优化。1)基本用法:用于存储和管理用户数据,如插入、查询、更新和删除用户信息。2)高级用法:处理复杂业务逻辑,如电子商务平台的订单和库存管理。3)性能优化:通过合理使用索引、分区表和查询缓存来提升性能。

See all articles

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
4 周前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳图形设置
4 周前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您听不到任何人,如何修复音频
4 周前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解锁Myrise中的所有内容
1 个月前By尊渡假赌尊渡假赌尊渡假赌

热工具

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

Atom编辑器mac版下载

Atom编辑器mac版下载

最流行的的开源编辑器

VSCode Windows 64位 下载

VSCode Windows 64位 下载

微软推出的免费、功能强大的一款IDE编辑器

禅工作室 13.0.1

禅工作室 13.0.1

功能强大的PHP集成开发环境

DVWA

DVWA

Damn Vulnerable Web App (DVWA) 是一个PHP/MySQL的Web应用程序,非常容易受到攻击。它的主要目标是成为安全专业人员在合法环境中测试自己的技能和工具的辅助工具,帮助Web开发人员更好地理解保护Web应用程序的过程,并帮助教师/学生在课堂环境中教授/学习Web应用程序安全。DVWA的目标是通过简单直接的界面练习一些最常见的Web漏洞,难度各不相同。请注意,该软件中