Most developers know that indexing is faster. But in the actual process, we often encounter some questions & difficulties:
This article attempts to explain the basic knowledge of indexing & answer the above questions.
Most developers who come into contact with indexes probably know that indexes are similar to the catalog of books. You need to find the content you want, find the qualified keywords through the catalog, then find the pageno of the corresponding chapter, and then find the specific content. .
In the data structure, the simplest index implementation is similar to a hashmap, which maps to a specific location through the keyword key to find the specific content. But in addition to hashing, there are many ways to implement indexing.
hash / b-tree / b -tree redis HSET / MongoDB&PostgreSQL / MySQL
hashmap
##See b-tree & b-tree difference in the picture: There are many articles on the Internet that explain this, but it is not the focus of this article.
Index should be stored in memory as much as possible, followed by data. Be careful to keep only necessary indexes, and use memory as wisely as possible.
If the index memory is close to filling up the memory, it will be easy to read the disk and the speed will slow down.
Take the simplest hashmap as an example, why is the complexity not O(1), but so-called close to O(1). Because there are key conflicts/duplications, when the DB is looking for it, if there is a lot of data with key conflicts, it still has to take turns to continue looking. The same goes for b-tree looking at key selection.
So a mistake that most developers often make is to index keys that have no distinction. For example: many collections have only centralized categories of type/status documents with a count of hundreds of thousands or more. Usually this kind of index is not helpful.
A loans collection is created here. Simplified to only have 100 pieces of data. This is a loan table with _id, userId, status (loan status), amount (amount).
db.loans.count()100
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "15D5A9A1", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }Note The COLLSCAN above scans the entire table because there is no index. Next we create several indexes respectively.
step 1 First create {userId:1, status:1}
db.loans.createIndex({userId:1, status:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "BB87F2BA", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "["repayed", "repayed"]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }Result: {userId:1, status:1} is hit as the winning plan.
step2: Create a typical index userId
db.loans.createIndex({userId:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "1B1A4861", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[\"59e022d33f239800129c61c7\", \"59e022d33f239800129c61c7\"]" ], "status" : [ "[\"repayed\", \"repayed\"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }Note that DB detects {userId:1, status:1} as a better execution plan.
db.loans.find({ "userId" : "59e022d33f239800129c61c7" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "1F09D68E", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }Notice that DB detects {userId:1} as a better execution plan, um~, as we expected.
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "status" : { "$eq" : "repayed" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
Interesting part: status does not hit the index, Full table scanNext step, add a sort:
db.loans.find({ "userId" : "59e022d33f239800129c61c7" }).sort({status:1}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "F5ABB1AA", "planCacheKey" : "764CBAA8", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ { "stage" : "SORT", "sortPattern" : { "status" : 1 }, "inputStage" : { "stage" : "SORT_KEY_GENERATOR", "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }(2) Other attempts
Interesting part: status does not hit the index
db.loans.find({ "status" : "repayed","userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "1B1A4861", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[\"59e022d33f239800129c61c7\", \"59e022d33f239800129c61c7\"]" ], "status" : [ "[\"repayed\", \"repayed\"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }The hit index is not related to the order of each field of the query, as we guessed.
Come back to the interesting part, we delete the index {userId:1}
db.loans.dropIndex({"userId":1}) { "nIndexesWas" : 3, "ok" : 1 } db.loans.find({"userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "5776AB9C", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }DB execution analyzer thinks that the index {userId:1, status:1} can be better, but there is no hit Composite index, this is because status is not the leading field.
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "status" : { "$eq" : "repayed" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }Change the sort angle again and interchange it with the previous query & sort. The previous one was:
db.loans.find({userId:1}).sort({ "status" : "repayed" })See what’s the difference?
db.loans.find({ "status" : "repayed" }).sort({userId:1}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "56EA6313", "planCacheKey" : "2CFCDA7F", "winningPlan" : { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[MinKey, MaxKey]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }As guessed, hit the index.
Let’s play again and confirm the leading filed test:
db.loans.dropIndex("userId_1_status_1") { "nIndexesWas" : 2, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" } ]
db.loans.createIndex({status:1, userId:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" }, { "v" : 2, "key" : { "status" : 1, "userId" : 1 }, "name" : "status_1_userId_1", "ns" : "cashLoan.loans" } ]
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "status" : 1, "userId" : 1 }, "indexName" : "status_1_userId_1", "isMultiKey" : false, "multiKeyPaths" : { "status" : [ ], "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "status" : [ "["repayed", "repayed"]" ], "userId" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" }, { "v" : 2, "key" : { "status" : 1, "userId" : 1 }, "name" : "status_1_userId_1", "ns" : "cashLoan.loans" } ]
db.loans.find({"userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "5776AB9C", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
看完这个试验,明白了 {userId:1, status:1} vs {status:1,userId:1} 的差别了吗?
PS:这个case 里面其实status 区分度不高,这里只是作为实例展示。
DB 一般都有执行器优化的分析,MySQL & MongoDB 都是 用explain 来做分析。
语法上MySQL :
explain your_sql
MongoDB:
yoursql.explain()
总结典型:理想的查询是结合explain 的指标,他们通常是多个的混合:
文末,还有最开头1个问题没回答:如果我的索引改加的都加了,还不够快,怎么办?
留个悬念,之后再写一篇。
更多PHP相关技术文章,请访问PHP教程栏目进行学习!
The above is the detailed content of Best Practices for MongoDB Indexes. For more information, please follow other related articles on the PHP Chinese website!