By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi
By Osmar Olivo, Product Manager at MongoDB
One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.
MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.
Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.
In the following sections we will take a deep dive into index intersection and how it can be applied to applications.
An Example - The Phone Book
Let’s take the example of a phone book with the following schema.
{ FirstName LastName Phone_Number Address }
If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?
db.phonebook.find({ FirstName : “John”, LastName : “Smith” })
I could use an individual index on FirstName and search for all of the “Johns”.
This would look something like ensureIndex( { FirstName : 1 } )
We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.
> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 1000000, "nscanned" : 1000000, "nscannedObjectsAllPlans" : 1000101, "nscannedAllPlans" : 1000101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 2043, "indexBounds" : { "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
How about creating an individual index on LastName?
This would look something like ensureIndex( { LastName : 1 } )
Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 400000, "nscanned" : 400000, "nscannedObjectsAllPlans" : 400101, "nscannedAllPlans" : 400101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 1, "nChunkSkips" : 0, "millis" : 852, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?
In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.
ensureIndex( { LastName : true, FirstName : 1 } ) db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1_FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 200000, "nscanned" : 200000, "nscannedObjectsAllPlans" : 200000, "nscannedAllPlans" : 200000, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 370, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ], "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.
Beyond Compound Indexes
The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.
Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.
{ Fname LName SSN Age Blood_Type Conditions : [] Medications : [ ] ... ... }
Some sample searches that a doctor/nurse may run on this system would look something like the below.
Find me a Patient with Blood_Type = O under the age of 50
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
Find me all patients over the age of 60 on Medication X
db.patients.find( { Medications : “X” , Age : { $gt : 60} })
Find me all Diabetic patients on medication Y
db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )
With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.
To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.
For example, looking at the earlier example:
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age
Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age
Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.
As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.
Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:
Find me all patients with Diabetes & High Blood Pressure
db.patients.find( { Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ] } } )
In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.
Do We Still Need Compound Indexes?
To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.
- Download MongoDB 2.6 Today
- Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper
原文地址:Efficient Indexing in MongoDB 2.6, 感谢原作者分享。

mySqlStringTypesimpactStorageAndPerformanCeaseAsfollows:1)長度,始終使用theSamestoragespace,whatcanbefasterbutlessspace-felfficity.2)varCharisvariable varcharisvariable length,morespace-morespace-morespace-effficitybuteftife buteftife butfority butfority textifforlyslower.3)

mysqlStringTypesIncludeVarChar,文本,char,Enum和set.1)varCharisVersAtileForvariable-lengthStringStringSuptoPuptOuptoPepePecifiedLimit.2)textisidealforlargetStortStorStoverStoverStorageWithoutAutAdefinedLength.3)charlisfixed-lenftenge,for forConsistentDatalikeCodes.4)

MySQLoffersvariousstringdatatypes:1)CHARforfixed-lengthstrings,2)VARCHARforvariable-lengthtext,3)BINARYandVARBINARYforbinarydata,4)BLOBandTEXTforlargedata,and5)ENUMandSETforcontrolledinput.Eachtypehasspecificusesandperformancecharacteristics,sochoose

TograntpermissionstonewMySQLusers,followthesesteps:1)AccessMySQLasauserwithsufficientprivileges,2)CreateanewuserwiththeCREATEUSERcommand,3)UsetheGRANTcommandtospecifypermissionslikeSELECT,INSERT,UPDATE,orALLPRIVILEGESonspecificdatabasesortables,and4)

toadduserInmysqleffect和securly,跟隨台詞:1)USEtheCreateUserStattoDaneWuser,指定thehostandastrongpassword.2)GrantNecterAryAryaryPrivilegesSustherthing privilegesgeStatement,usifementStatement,adheringtotheprinciplelastprefilegege.3)

toaddanewuserwithcomplexpermissionsinmysql,loldtheSesteps:1)創建eTheEserWithCreateuser'newuser'newuser'@''localhost'Indedify'pa ssword';。 2)GrantreadAccesstoalltablesin'mydatabase'withGrantSelectOnMyDatabase.to'newuser'@'localhost';。 3)GrantWriteAccessto'

MySQL中的字符串數據類型包括CHAR、VARCHAR、BINARY、VARBINARY、BLOB、TEXT,排序規則(Collations)決定了字符串的比較和排序方式。 1.CHAR適合固定長度字符串,VARCHAR適合可變長度字符串。 2.BINARY和VARBINARY用於二進制數據,BLOB和TEXT用於大對像數據。 3.排序規則如utf8mb4_unicode_ci忽略大小寫,適合用戶名;utf8mb4_bin區分大小寫,適合需要精確比較的字段。

最佳的MySQLVARCHAR列長度選擇應基於數據分析、考慮未來增長、評估性能影響及字符集需求。 1)分析數據以確定典型長度;2)預留未來擴展空間;3)注意大長度對性能的影響;4)考慮字符集對存儲的影響。通過這些步驟,可以優化數據庫的效率和擴展性。


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

Video Face Swap
使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱門文章

熱工具

SublimeText3漢化版
中文版,非常好用

Safe Exam Browser
Safe Exam Browser是一個安全的瀏覽器環境,安全地進行線上考試。該軟體將任何電腦變成一個安全的工作站。它控制對任何實用工具的訪問,並防止學生使用未經授權的資源。

DVWA
Damn Vulnerable Web App (DVWA) 是一個PHP/MySQL的Web應用程序,非常容易受到攻擊。它的主要目標是成為安全專業人員在合法環境中測試自己的技能和工具的輔助工具,幫助Web開發人員更好地理解保護網路應用程式的過程,並幫助教師/學生在課堂環境中教授/學習Web應用程式安全性。 DVWA的目標是透過簡單直接的介面練習一些最常見的Web漏洞,難度各不相同。請注意,該軟體中

mPDF
mPDF是一個PHP庫,可以從UTF-8編碼的HTML產生PDF檔案。原作者Ian Back編寫mPDF以從他的網站上「即時」輸出PDF文件,並處理不同的語言。與原始腳本如HTML2FPDF相比,它的速度較慢,並且在使用Unicode字體時產生的檔案較大,但支援CSS樣式等,並進行了大量增強。支援幾乎所有語言,包括RTL(阿拉伯語和希伯來語)和CJK(中日韓)。支援嵌套的區塊級元素(如P、DIV),

MantisBT
Mantis是一個易於部署的基於Web的缺陷追蹤工具,用於幫助產品缺陷追蹤。它需要PHP、MySQL和一個Web伺服器。請查看我們的演示和託管服務。