by A. Jesse Jiryu Davis, Python Evangelist at 10gen In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations
by A. Jesse Jiryu Davis, Python Evangelist at 10gen
In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.
Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.
Setup
You’ll need a recent install of MongoDB (I’m using 2.4.4), Python, a recent version of PyMongo (at least 2.4—I’m using 2.5.2) and the code in my cluster-profile repository on GitHub. If you install the Colorama Python package you’ll get cute colored output. These scripts were tested on my Mac.
Sharded cluster of replica sets
Run the cluster_setup.py
script in my repository. It sets up a standard sharded cluster for you running on your local machine. There’s a mongos
, three config servers, and two shards, each of which is a three-member replica set. The first shard’s replica set is running on ports 4000 through 4002, the second shard is on ports 5000 through 5002, and the three config servers are on ports 6000 through 6002:
For the finale, cluster_setup.py
makes a collection named sharded_collection
, sharded on a key named shard_key
.
In a normal deployment, we’d let MongoDB’s balancer automatically distribute chunks of data among our two shards. But for this demo we want documents to be on predictable shards, so my script disables the balancer. It makes a chunk for all documents with shard_key
less than 500 and another chunk for documents with shard_key
greater than or equal to 500. It moves the high chunk to replset_1
:
client = MongoClient() # Connect to mongos. admin = client.admin # admin database.
Pre-split.
admin.command( 'split', 'test.sharded_collection', middle={'shard_key': 500}) admin.command( 'moveChunk', 'test.sharded_collection', find={'shard_key': 500}, to='replset_1')
If you connect to mongos
with the MongoDB shell, sh.status()
shows there’s one chunk on each of the two shards:
{ "shard_key" : { "$minKey" : 1 } } -->> { "shard_key" : 500 } on : replset_0 { "t" : 2, "i" : 1 } { "shard_key" : 500 } -->> { "shard_key" : { "$maxKey" : 1 } } on : replset_1 { "t" : 2, "i" : 0 }
The setup script also inserts a document with a shard_key
of 0 and another with a shard_key
of 500. Now we’re ready for some profiling.
Profiling
Run the tail_profile.py
script from my repository. It connects to all the replica set members. On each, it sets the profiling level to 2 (“log everything”) on the test
database, and creates a tailable cursor on the system.profile
collection. The script filters out some noise in the profile collection—for example, the activities of the tailable cursor show up in the system.profile
collection that it’s tailing. Any legitimate entries in the profile are spat out to the console in pretty colors.
Experiments
Targeted queries versus scatter-gather
Let’s run a query from Python in a separate terminal:
>>> from pymongo import MongoClient >>> # Connect to mongos. >>> collection = MongoClient().test.sharded_collection >>> collection.find_one({'shard_key': 0}) {'_id': ObjectId('51bb6f1cca1ce958c89b348a'), 'shard_key': 0}
tail_profile.py
prints:
replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
The query includes the shard key, so mongos
reads from the shard that can satisfy it. Adding shards can scale out your throughput on a query like this. What about a query that doesn’t contain the shard key?:
>>> collection.find_one({})
mongos
sends the query to both shards:
replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
replset_1 primary on 5000: query test.sharded_collection {“shard_key”: 500}
For fan-out queries like this, adding more shards won’t scale out your query throughput as well as it would for targeted queries, because every shard has to process every query. But we can scale throughput on queries like these by reading from secondaries.
Queries with read preferences
We can use read preferences to read from secondaries:
>>> from pymongo.read_preferences import ReadPreference >>> collection.find_one({}, read_preference=ReadPreference.SECONDARY)
tail_profile.py
shows us that mongos
chose a random secondary from each shard:
replset_0 secondary on 4001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
replset_1 secondary on 5001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
Note how PyMongo passes the read preference to mongos
in the query, as the $readPreference
field. mongos
targets one secondary in each of the two replica sets.
Updates
With a sharded collection, updates must either include the shard key or be “multi-updates”. An update with the shard key goes to the proper shard, of course:
>>> collection.update({'shard_key': -100}, {'$set': {'field': 'value'}})
replset_0 primary on 4000: update test.sharded_collection {“shard_key”: -100}
mongos
only sends the update to replset_0
, because we put the chunk of documents with shard_key
less than 500 there.
A multi-update hits all shards:
>>> collection.update({}, {'$set': {'field': 'value'}}, multi=True)
replset_0 primary on 4000: update test.sharded_collection {}
replset_1 primary on 5000: update test.sharded_collection {}
A multi-update on a range of the shard key need only involve the proper shard:
>>> collection.update({'shard_key': {'$gt': 1000}}, {'$set': {'field': 'value'}}, multi=True)
replset_1 primary on 5000: update test.sharded_collection {“shard_key”: {“$gt”: 1000}}
So targeted updates that include the shard key can be scaled out by adding shards. Even multi-updates can be scaled out if they include a range of the shard key, but multi-updates without the shard key won’t benefit from extra shards.
Commands
In version 2.4, mongos
can use secondaries not only for queries, but also for some commands. You can run count
on secondaries if you pass the right read preference:
>>> cursor = collection.find(read_preference=ReadPreference.SECONDARY) >>> cursor.count()
replset_0 secondary on 4001: command count: sharded_collection
replset_1 secondary on 5001: command count: sharded_collection
Whereas findAndModify
, since it modifies data, is run on the primaries no matter your read preference:
>>> db = MongoClient().test >>> test.command( ... 'findAndModify', ... 'sharded_collection', ... query={'shard_key': -1}, ... remove=True, ... read_preference=ReadPreference.SECONDARY)
replset_0 primary on 4000: command findAndModify: sharded_collection
Go Forth And Scale
To scale a sharded cluster, you should understand how operations are distributed: are they scatter-gather, or targeted to one shard? Do they run on primaries or secondaries? If you set up a cluster and test your queries interactively like we did here, you can see how your cluster behaves in practice, and design your application for future growth.
Read Jesse’s blog, Emptysquare and follow him on Github
原文地址:Real-time Profiling a MongoDB Cluster, 感谢原作者分享。

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.

MySQL is suitable for beginners because: 1) easy to install and configure, 2) rich learning resources, 3) intuitive SQL syntax, 4) powerful tool support. Nevertheless, beginners need to overcome challenges such as database design, query optimization, security management, and data backup.

Yes,SQLisaprogramminglanguagespecializedfordatamanagement.1)It'sdeclarative,focusingonwhattoachieveratherthanhow.2)SQLisessentialforquerying,inserting,updating,anddeletingdatainrelationaldatabases.3)Whileuser-friendly,itrequiresoptimizationtoavoidper

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download
The most popular open source editor

Dreamweaver CS6
Visual web development tools

Dreamweaver Mac version
Visual web development tools