Usually the first few pages of each book are a table of contents, and the last few pages will have a keyword index.
For the database, the system table (such as: sysobjects, etc.) is the directory, and the index on the marked field is like the keyword index at the back of the book.
In the database, the difference between the directory (data dictionary) and the index: the directory is vertical and the index is horizontal.
1. Factors affecting index function
Discrimination (retrieval ratio)
The optimizer generates an execution plan based on statistical information. If the database does not collect index statistical information, The optimizer has no way to start and can only execute the query step by step through a full table scan. Therefore, the newly created index needs to re-run statistics, otherwise the index will be invalid.
For example, there is a table TABLE1, in which there is a field COL1 with three values: "1", "2", and "3". The result of running statistics is to tell the database about the fields in the data in TABLE1. The proportion of various values of COL1. The indication is as follows:
“1” – 12%;
“2” – 66%;
“3” – 22%.
Assume that there is another field COL2 value and the percentage of data is as follows:
"A" - 50%;
"B" - 50%.
Then query statement 1:
select * from TABLE1 where COL1 = “1” and COL2 = “A”,
The database optimizer will give priority to selecting field COL1 Index is used to locate the data in the table, because the result set can be quickly located in a small range of 12% through the index on COL1. On the contrary, for query statement 2:
select * from TABLE1 where COL1 = "2" and COL2 = "A",
the database will give priority to the index on COL2, because for the statement The index on the query condition COL2 of 2 has better discrimination.
As can be seen from the above, the database optimizer usually gives priority to indexes with higher discrimination (for query conditions, the index selected may be different for different conditions).
The data in the database changes, so the statistical information collected at a certain time may become outdated after a period of time, or even mislead the database optimizer, which will also cause low operating performance. So in addition to the need to run statistics when the index is initially created, statistics also need to be run when the data in the table changes. Experience: When the amount of data in the table changes by 10%, statistics need to be rerun.
2. Aggregation degree
Range scan
Table size:
Small table
Medium and large table
Very large table
Business type
OLTP and OLAP
Function and index
Function, like statement. . .
Substring(col_name,1, 3)vs. Substring(col_name, 3, 3)
like 'QQQ% vs. like '%QQQ'
Index overhead
Performance weapon
Double-edged sword
The impact of indexes on insert operations (Oracle)
The impact of indexes on insert operations (MySQL)
Compare the impact of indexes and triggers on performance
Index summary
Use indexes to achieve efficient access to critical data. But you need to know that each index will bring additional overhead to the database update. This means that inefficient indexes can bring disaster to the database.
For databases, we must focus on reading key data and provide them with the most efficient access path. The basic strategy for this is to build indexes. While the index provides efficient access, it also brings additional system overhead. The overhead is divided into disk space overhead and processor overhead. Next we discuss processor overhead. Whenever a record is inserted or deleted from a table, all indexes on the table must be adjusted accordingly. This adjustment also occurs whenever an update is made to an indexed field. For example, if inserting data into an unindexed table takes 100 units of time, each additional index will add 100 to 250 units of time. Interestingly, the overhead of maintaining the index is roughly equivalent to the overhead of a simple trigger.
Introducing some of the most popular information on the front line of indexing. This information comes from developWorks. This information is listed because I think this information is usually worth referring to:
1. When to When the query ends in a reasonable amount of time, you should avoid adding indexes because indexes can slow down update operations and consume additional space. Sometimes there may be large indexes covering several queries.
1. Columns with larger cardinality are very suitable for indexing.
3. Considering the management overhead, avoid using more than 5 columns in the index.
4. For multi-column indexes, put the most referenced columns in the query in front of the definition.
5. Avoid adding indexes that are similar to existing indexes. This will cause more work for the optimizer and will slow down the update operation. Instead, we should modify the existing index to include the additional columns. For example, suppose there is an index i1 on a table (c1,c2). You notice that "wherec2=?" is used in the query, so you create an index i2 on (c2). But this similar index doesn't add anything, it's just redundancy for i1, and now it's additional overhead.
6. If the table is read-only and contains many rows, you can try to define an index and use the INCLUDE clause in CREATE INDEX to make the index include all columns referenced in the query (included by the INCLUDE clause The column is not part of the index, but is stored only as part of the index page to avoid additional data FETCHES).
For the data warehouse (query system database), more indexes can be established (the ratio of index to data can be 1:1).
When deciding whether to use an index, you can focus on the retrieval ratio. That is, the basis for judging the effectiveness of the index is the percentage of data retrieved using the key value as the unique condition. The lower the percentage, the more efficient the index is. This conclusion is based on some assumptions, such as the relative performance of disk access.
Whether the physical locations of records related to index key values are adjacent is also important, because data is manipulated through blocks. After the index is created, if the records pointed to by the index key are scattered throughout the table, even if these records account for a small proportion in the table, the performance of the index will be greatly reduced because they are scattered across the entire disk.
It is also worth noting that functions and type conversions may cause index failure.
The above is the detailed content of How to use mysql index optimization. For more information, please follow other related articles on the PHP Chinese website!