Home >Backend Development >PHP Tutorial >Advantages and Disadvantages of Indexing Page 1/2_PHP Tutorial
Advantages and Disadvantages of Index
Why create an index? This is because creating indexes can greatly improve system performance. First, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed. Second, it can greatly speed up data retrieval, which is also the main reason for creating an index. Third, it can speed up the connection between tables, which is particularly meaningful in achieving referential integrity of data. Fourth, when using grouping and sorting clauses for data retrieval, the time for grouping and sorting in queries can also be significantly reduced. Fifth, by using indexes, optimization hiders can be used during the query process to improve system performance.
Some people may ask: There are so many advantages to adding an index, why not create an index for every column in the table? Although this idea is reasonable, it is also one-sided. Although indexes have many advantages, it is very unwise to add an index for every column in the table. This is because there are many downsides to adding an index. First, creating and maintaining indexes takes time, and this time increases as the amount of data increases. Second, indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space. If a clustered index is to be established, the space required will be larger. Third, when adding, deleting, and modifying data in the table, the index must be dynamically maintained, which reduces the data maintenance speed.
Indices are built on certain columns in the database table. Therefore, when creating an index, you should carefully consider which columns can be indexed and which columns cannot be indexed. Generally speaking, indexes should be created on these columns. For example: on columns that are frequently searched, it can speed up searches; on columns that serve as primary keys, it enforces the uniqueness of the column and organizes the arrangement structure of the data in the table; Create indexes on columns that are often used in joins. These columns are mainly foreign keys, which can speed up joins; create indexes on columns that often need to be searched based on ranges, because the indexes have been sorted and their specified ranges are continuous; in Create indexes on columns that often need to be sorted, because the indexes are already sorted, so queries can use the sorting of the indexes to speed up sorting query times; create indexes on columns that are often used in WHERE clauses to speed up the judgment of conditions.
Also, there are some columns that should not be indexed. Generally speaking, these columns that should not be indexed have the following characteristics: First, indexes should not be created for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing or not indexing does not improve query speed. On the contrary, due to the addition of indexes, the maintenance speed of the system is reduced and the space requirements are increased. Second, indexes should not be added to columns that have few data values. This is because, since these columns have very few values, such as the gender column of the personnel table, in the query results, the data rows in the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is huge. Increasing the index does not significantly speed up retrieval. Third, indexes should not be added to columns defined as text, image, and bit data types. This is because the data volume of these columns is either quite large or has very few values. Fourth, when modification performance is much greater than retrieval performance, indexes should not be created. This is because modification performance and retrieval performance are contradictory to each other. When adding indexes, retrieval performance will be improved, but modification performance will be reduced. When reducing indexes, modification performance will increase and retrieval performance will decrease. Therefore, when modification performance is much greater than retrieval performance, indexes should not be created.
Methods to create indexes and characteristics of indexes
Methods to create indexes
There are many ways to create indexes, including direct index creation and indirect index creation. method. Create an index directly, such as using the CREATE INDEX statement or use the Create Index Wizard, or create an index indirectly, such as defining a primary key constraint or a unique key constraint in the table, and the index is also created at the same time. Although both methods can create indexes, the specific contents of creating indexes are different.
Use the CREATE INDEX statement or use the Create Index Wizard to create an index. This is the most basic way to create an index, and this method is the most flexible. You can customize the index to meet your needs. When you create an index in this way, you have many options, such as specifying how full the data pages are, sorting, sorting statistics, etc., to optimize the index. Using this method, you can specify the type, uniqueness, and compositeness of the index, that is, you can create either a clustered index or a non-clustered index, and you can create an index on one column or two. Or create an index on more than two columns.
Indexes can also be created indirectly by defining primary key constraints or unique key constraints. A primary key constraint is a logic that maintains data integrity by restricting records in a table to have the same primary key record.When creating a primary key constraint, the system automatically creates a unique clustered index. Although, logically, the primary key constraint is an important structure, but in terms of physical structure, the structure corresponding to the primary key constraint is a unique clustered index. In other words, in physical implementation, there is no primary key constraint, but only a unique clustered index. Similarly, when creating a unique key constraint, an index is also created, which is a unique non-clustered index. Therefore, when using constraints to create an index, the type and characteristics of the index have basically been determined, and there is less room for user customization.
When defining a primary key or unique key constraint on a table, if the table already has a standard index created using the CREATE INDEX statement, the index created by the primary key constraint or unique key constraint will overwrite the previously created index. Standard index. In other words, indexes created by primary key constraints or unique key constraints have a higher priority than indexes created using the CREATE INDEX statement.
Characteristics of index
Index has two characteristics, namely unique index and composite index.
Unique index ensures that all data in the index column is unique and does not contain redundant data. If the table already has a primary key constraint or a unique key constraint, SQL Server automatically creates a unique index when the table is created or modified. However, if uniqueness must be guaranteed, you should create a primary key constraint or a unique key constraint instead of creating a unique index. When creating a unique index, you should carefully consider these rules: When you create a primary key constraint or a unique key constraint on a table, SQL Server automatically creates a unique index; if the table already contains data, then when you create the index, SQL Server checks the redundancy of data already in the table; whenever an insert statement is used to insert data or a modify statement is used to modify data, SQL Server checks the data for redundancy: if there are redundant values, then SQL Server cancels the statement Execute and return an error message; ensure that each row of data in the table has a unique value, so that each entity can be uniquely confirmed; unique indexes can only be created on columns that can guarantee the integrity of the entity, for example, You cannot create a unique index on the name column in the personnel table because people can have the same name.
A composite index is an index created on two or more columns. When searching, when two or more columns serve as a key value, it is best to create a composite index on these columns. When creating a composite index, you should consider these rules: Up to 16 columns can be combined into a single composite index, and the total length of the columns that make up the composite index cannot exceed 900 bytes, which means that the length of the composite column cannot be too long; In a composite index, all columns must come from the same table, and composite columns cannot be created across tables; in a composite index, the order of the columns is very important, so the order of the columns must be carefully arranged. In principle, you should first define The most unique column, for example, the index on (COL1, COL2) is not the same as the index on (COL2, COL1) because the order of the columns of the two indexes is different; in order for the query optimizer to use the composite index, the query The WHERE clause in the statement must refer to the first column in the composite index; composite indexes are very useful when there are multiple key columns in the table; using composite indexes can improve query performance and reduce the number of indexes created in a table quantity.
Types of indexes
According to whether the order of the index is the same as the physical order of the data table, the index can be divided into two types. One is a clustered index in which the physical order of the data table is the same as the index order, and the other is a non-clustered index in which the physical order of the data table is different from the index order.
Architecture of clustered index
The structure of the index is similar to a tree structure. The top of the tree is called the leaf level, the other parts of the tree are called non-leaf levels, and the root of the tree is at the non-leaf level. Leaf level. Similarly, in a clustered index, the leaf level and non-leaf level of the clustered index form a tree structure, and the lowest level of the index is the leaf level. In a clustered index, the data page where the data in the table is located is the leaf level, the index page above the leaf level is the non-leaf level, and the index page where the index data is located is the non-leaf level. In a clustered index, the order of data values is always in ascending order.
Clustered indexes should be created on columns in the table that are frequently searched or accessed sequentially.When creating a clustered index, you should consider these factors: Each table can only have one clustered index, because there can only be one physical order of data in the table; the physical order of the rows in the table is the same as the physical order of the rows in the index. Create a clustered index before creating any non-clustered index. This is because the clustered index changes the physical order of the rows in the table. The data rows are arranged in a certain order and this order is automatically maintained; the uniqueness of the key value or use the UNIQUE key Words are maintained explicitly, or by an internal unique identifier that is used by the system itself and cannot be accessed by users; the average size of a clustered index is about five percent of the data table, but, in practice The size of a clustered index often changes according to the size of the index column; during the creation process of the index, SQL Server temporarily uses the disk space of the current database. When creating a clustered index, 1.2 times the size of the table space is required. Therefore, Be sure to ensure there is enough space to create a clustered index.
When the system accesses data in a table, it first determines whether there is an index on the corresponding column and whether the index is meaningful for the data to be retrieved. If an index exists and is meaningful, the system uses the index to access records in the table. The system browses to data starting from the index, and index browsing starts from the root of the tree index. Starting from the root, the search value is compared with each key value to determine whether the search value is greater than or equal to the key value. This step is repeated until a key value is encountered that is larger than the search value, or the search value is greater than or equal to all key values on the index page.
The architecture of non-clustered index
The structure of non-clustered index is also a tree structure, which is very similar to the structure of clustered index, but there are also obvious differences.
In a non-clustered index, the leaf level contains only key values and no data rows. Nonclustered indexes represent the logical order of rows. There are two architectures for non-clustered indexes: one architecture creates a non-clustered index on a table without a clustered index, and the other architecture creates a non-clustered index on a table with a clustered index.
If there is no clustered index in a data table, then the data table is also called a data heap. When a nonclustered index is created at the top of the data heap, the system uses the row identifier in the index page to point to the record in the data page. Row identifiers store information about where the data is located. The data heap is maintained using Index Allocation Map (IAM) pages. The IAM page contains the storage information of the cluster where the data heap is located. In the system table sysindexes, there is a pointer pointing to the first IAM page related to the data heap. The system uses IAM pages to browse the data heap and find space where new rows can be inserted. These data pages and the records within these data pages are not in any order and are not linked together. The only connection between these data pages is the order of records in the IAM. When a nonclustered index is created on the data heap, the leaf level contains row identifiers that point to the data pages. The row identifier specifies the logical sequence of record rows and consists of the file ID, page number, and row ID. The row identifiers remain unique. The order of the leaf-level pages of a nonclustered index differs from the physical order of the data in the table. These key values are maintained in ascending order at the leaf level.
When a non-clustered index is created on a table with a clustered index, the system uses the clustered key in the index page that points to the clustered index. The clustering key stores the location information of the data. If a table has a clustered index, then the leaf level of the nonclustered index contains the clustered key value that maps to the clustered key, rather than mapping to the physical row identifier. When the system accesses data in a table with a non-clustered index, and this non-clustered index is created on a clustered index, it first finds the pointer to the clustered index from the non-clustered index, and then uses the clustered index to find the pointer to the clustered index. Index to find the data.
Non-clustered indexes are very useful when data needs to be retrieved in multiple ways. When creating a non-clustered index, consider these situations: By default, the index created is a non-clustered index; on each table, no more than 249 non-clustered indexes can be created, while clustered indexes There can be at most one.