Efficient Indexing in MongoDB 2.6-mysql教程-PHP中文网

首页

数据库

mysql教程

Efficient Indexing in MongoDB 2.6

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:35 PM

mongodb

By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi

By Osmar Olivo, Product Manager at MongoDB

One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.

MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.

Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.

In the following sections we will take a deep dive into index intersection and how it can be applied to applications.

An Example - The Phone Book

Let’s take the example of a phone book with the following schema.

{
    FirstName
    LastName
    Phone_Number
    Address
}

If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?

db.phonebook.find({ FirstName : “John”, LastName : “Smith” })

I could use an individual index on FirstName and search for all of the “Johns”.

This would look something like ensureIndex( { FirstName : 1 } )

We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.

> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 1000000,
    "nscanned" : 1000000,
    "nscannedObjectsAllPlans" : 1000101,
    "nscannedAllPlans" : 1000101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 2,
    "nChunkSkips" : 0,
    "millis" : 2043,
    "indexBounds" : {
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

How about creating an individual index on LastName?

This would look something like ensureIndex( { LastName : 1 } )

Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?

db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 400000,
    "nscanned" : 400000,
    "nscannedObjectsAllPlans" : 400101,
    "nscannedAllPlans" : 400101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 852,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?

In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.

ensureIndex( {  LastName : true, FirstName : 1  } ) 
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1_FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 200000,
    "nscanned" : 200000,
    "nscannedObjectsAllPlans" : 200000,
    "nscannedAllPlans" : 200000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 370,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ],
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.

Beyond Compound Indexes

The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.

Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.

{
      Fname
      LName
      SSN
      Age
      Blood_Type
      Conditions : [] 
      Medications : [ ]
      ...
      ...
}

Some sample searches that a doctor/nurse may run on this system would look something like the below.

Find me a Patient with Blood_Type = O under the age of 50

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )

Find me all patients over the age of 60 on Medication X

db.patients.find( { Medications : “X” , Age : { $gt : 60} })

Find me all Diabetic patients on medication Y

db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )

With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.

To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.

For example, looking at the earlier example:

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )

It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age

Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age

Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.

As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:

Find me all patients with Diabetes & High Blood Pressure

db.patients.find( {  Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ]  }    }  )

In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.

Do We Still Need Compound Indexes?

To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

Download MongoDB 2.6 Today
Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper

原文地址：Efficient Indexing in MongoDB 2.6, 感谢原作者分享。

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

您可以使用哪些工具来监视MySQL性能？Apr 23, 2025 am 12:21 AM

如何有效监控MySQL性能？使用mysqladmin、SHOWGLOBALSTATUS、PerconaMonitoringandManagement(PMM)和MySQLEnterpriseMonitor等工具。1.使用mysqladmin查看连接数。2.用SHOWGLOBALSTATUS查看查询数。3.PMM提供详细性能数据和图形化界面。4.MySQLEnterpriseMonitor提供丰富的监控功能和报警机制。

MySQL与SQL Server有何不同？Apr 23, 2025 am 12:20 AM

MySQL和SQLServer的区别在于：1)MySQL是开源的，适用于Web和嵌入式系统，2)SQLServer是微软的商业产品，适用于企业级应用。两者在存储引擎、性能优化和应用场景上有显着差异，选择时需考虑项目规模和未来扩展性。

在哪些情况下，您可以选择SQL Server而不是MySQL？Apr 23, 2025 am 12:20 AM

在需要高可用性、高级安全性和良好集成性的企业级应用场景下，应选择SQLServer而不是MySQL。1)SQLServer提供企业级功能，如高可用性和高级安全性。2)它与微软生态系统如VisualStudio和PowerBI紧密集成。3)SQLServer在性能优化方面表现出色，支持内存优化表和列存储索引。

MySQL如何处理角色集和碰撞？Apr 23, 2025 am 12:19 AM

mySqlManagesCharacterSetsetSandCollationsyutusututf-8asthEdeFault，允许ConfigurationAtdataBase，table和columnlevels，AndrequiringCarefullageLignmentToavoidMismatches.1）setDefeaultCharactersetTercharactersetEtCollacterSeteTandColletationForAdataBase.2）conformentcollecharactersettersetertersetcollatertersetcollationcollation

MySQL中有什么触发器？Apr 23, 2025 am 12:11 AM

MySQL触发器是与表相关联的自动执行的存储过程，用于在特定数据操作时执行一系列操作。1）触发器定义与作用：用于数据校验、日志记录等。2）工作原理：分为BEFORE和AFTER，支持行级触发。3）使用示例：可用于记录薪资变更或更新库存。4）调试技巧：使用SHOWTRIGGERS和SHOWCREATETRIGGER命令。5）性能优化：避免复杂操作，使用索引，管理事务。

您如何在MySQL中创建和管理用户帐户？Apr 22, 2025 pm 06:05 PM

在MySQL中创建和管理用户账户的步骤如下：1.创建用户：使用CREATEUSER'newuser'@'localhost'IDENTIFIEDBY'password';2.分配权限：使用GRANTSELECT,INSERT,UPDATEONmydatabase.TO'newuser'@'localhost';3.修正权限错误：使用REVOKEALLPRIVILEGESONmydatabase.FROM'newuser'@'localhost';然后重新分配权限；4.优化权限：使用SHOWGRA

MySQL与Oracle有何不同？Apr 22, 2025 pm 05:57 PM

MySQL适合快速开发和中小型应用，Oracle适合大型企业和高可用性需求。1）MySQL开源、易用，适用于Web应用和中小型企业。2）Oracle功能强大，适合大型企业和政府机构。3）MySQL支持多种存储引擎，Oracle提供丰富的企业级功能。

与其他关系数据库相比，使用MySQL的缺点是什么？Apr 22, 2025 pm 05:49 PM

MySQL相比其他关系型数据库的劣势包括：1.性能问题：在处理大规模数据时可能遇到瓶颈，PostgreSQL在复杂查询和大数据处理上表现更优。2.扩展性：水平扩展能力不如GoogleSpanner和AmazonAurora。3.功能限制：在高级功能上不如PostgreSQL和Oracle，某些功能需要更多自定义代码和维护。

See all articles