Home  >  Article  >  Database  >  Detailed interpretation of mysql indexes and transactions

Detailed interpretation of mysql indexes and transactions

不言
不言forward
2018-12-29 11:13:574872browse

This article brings you a detailed interpretation of mysql indexes and transactions. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

1. What does an index do?

Many times, when your application performs SQL queries very slowly, you should think about whether it can Build index.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Only spatial column type indexes use R-tree, and MEMORY tables also support hash indexes.

The index is a sorted list, which stores the index value and the physical address of the row containing the data. When the data is very large, the index can greatly speed up the query. This This is because after using the index, you do not need to scan the entire table to locate the data of a certain row. Instead, you first find the physical address corresponding to the row of data through the index table and then access the corresponding data.

2. Advantages and Disadvantages of Index

Advantages: It can quickly retrieve, reduce the number of I/Os, and speed up retrieval; group and Sorting can speed up grouping and sorting;

Disadvantages: The index itself is also a table, so it will occupy storage space. Generally speaking, the space occupied by the index table is 1.5 times that of the data table; index Table maintenance and creation require time costs, which increase as the amount of data increases; building indexes will reduce the efficiency of modification operations (deletion, addition, modification) of the data table, because modifications are required while modifying the data table. Index table;

3. Index classification

Common index types are:Primary key index, unique index, ordinary index, full-text index, combined index

1, Primary key index: That is, the primary index, the index is established based on the primary key pk_clolum (length), duplication is not allowed, and null values ​​are not allowed;

ALTER TABLE 'table_name' ADD PRIMARY KEY('id');

2, Unique index: The value of the column used to create the index must be unique, and null values ​​are allowed

ALTER TABLE 'table_name' ADD UNIQUE('email');

3, Ordinary index: Constructed with ordinary columns in the table Index, without any restrictions

ALTER TABLE 'table_name' ADD INDEX index_name('description');

4, Full-text index: Index built with columns of large text objects (will be explained in the next part)

ALTER TABLE 'table_name' ADD FULLTEXT('content');

5, Combined index: An index built by combining multiple columns. The values ​​in these multiple columns are not allowed to have null values

ALTER TABLE 'table_name' ADD INDEX index_name('col1','col2','col3');

Follow the "leftmost prefix" principle and use the most commonly used ones as retrieval or sorting The columns are placed on the far left, in descending order. The combined index is equivalent to establishing three indexes: col1, col1col2, col1col2col3, and indexes cannot be used on col2 or col3.

When using a combined index, the index key may be too large because the column name length is too long, resulting in reduced efficiency. If allowed, you can only take the first few characters of col1 and col2 as the index.

ALTER TABLE 'table_name' ADD INDEX index_name(col1(4),col2(3));

means using the first 4 characters of col1 and the first 3 characters of col2 as the index

4. Index implementation principle

MySQL support There are many storage engines, and various storage engines have different support for indexes. Therefore, MySQL database supports multiple index types, such as BTree index, B Tree index, hash index, full-text index, etc.,

1. Hash index:

Only the memory storage engine supports hash index. The hash index uses the value of the index column to calculate the hashCode of the value, and then stores the row containing the value at the corresponding location of the hashCode. Because the hash algorithm is used to access the physical location of the data, the access speed is very fast. However, a value can only correspond to one hashCode, and it is a hash distribution method, so the hash index does not support range search and sorting functions.

2. Full-text index:

FULLTEXT (full-text) index can only be used for MyISAM and InnoDB. For larger data, generating a full-text index is very time-consuming and space-consuming. For large text objects or larger CHAR type data, if you use a normal index, it is still feasible to match the first few characters of the text, but if you want to match a few words in the middle of the text, you must use LIKE %word% To match, it will take a long time to process, and the response time will be greatly increased. In this case, you can use the FULLTEXT index. When generating the FULLTEXT index, a list of words will be generated for the text, and it will be indexed in time. Index based on this list of words. FULLTEXT can be created when creating the table, or it can be added using ALTER or CREATE INDEX when needed:

//创建表的时候添加FULLTEXT索引
CTREATE TABLE my_table(
id INT(10) PRIMARY KEY,
name VARCHAR(10) NOT NULL,
my_text text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,
FULLTEXT(my_text));

//创建表以后,在需要的时候添加FULLTEXT索引
ALTER my_table ADD FULLTEXT ft_index(my_text);
CREATE INDEX ft_index ON my_table(my_text);

For larger data sets, add the data to a table without a FULLTEXT index, and then add A FULLTEXT index is faster than adding data to a table that already has a FULLTEXT index.

The full-text index that comes with MySQL can only be used with the MyISAM storage engine. If it is other data engines, the full-text index will not take effect.

In MySQL, the full-text index detachment is available in English, but it is not currently supported in Chinese.

In MySQL, if the retrieved string is too short, the expected results cannot be retrieved. The retrieved string length must be at least 4 bytes. In addition, if the retrieved characters include stop words, then the stop words will be ignored.

3. BTree index and B Tree index

BTree index

BTree is a balanced search multi-tree. Let the degree of the tree be d (d>1) and the height be h, then BTree must meet the following conditions:

The height of each leaf node is the same, equal to h;

Each non-leaf node consists of n-1 keys and n pointers, where d

The leaf node pointers are all null;

The keys of non-leaf nodes are all [key, data] tuples, Among them, key represents the key as the index, and data is the data of the row where the key value is located;

The structure of BTree is as follows:

Detailed interpretation of mysql indexes and transactions

## in Under the structure of BTree, you can use the binary search method. The search complexity is h*log(n). Generally speaking, the height of the tree is very small, usually about 3, so BTree is a very efficient search structure. .

B Tree Index

B Tree is a variant of BTree. Let d be the degree of the tree and h be the height of the tree. The main differences between B Tree and BTree are:

B Tree's non-leaf nodes do not store data, only key values;

The leaf nodes of B Tree have no pointers, all key values ​​will appear on the leaf nodes, and the key values ​​stored in the key The physical address of the corresponding data;

The structure of B Tree is as follows:

Detailed interpretation of mysql indexes and transactions## Generally speaking, B Tree is more suitable for implementation than BTree The index structure of external memory is because the storage engine design experts cleverly make use of the storage structure of external memory (disk), that is, a sector of the disk is an integer multiple of page (page), and page is a unit in storage. Usually the default is 4K, so the nodes of the index structure are designed to be the size of a page, and then use the "pre-read" principle of external memory to read the entire node data into the memory each time it is read, and then store it in the memory. Search, it is known that the memory reading speed is hundreds of times the external memory reading I/O speed, so the key to improving the search speed is to use as little disk I/O as possible, then we can know that the key in each node The greater the number, the smaller the height of the tree and the fewer times I/O required. Therefore, generally speaking, B Tree is faster than BTree because data is not stored in non-leaf nodes of B Tree, so more data can be stored. key.

B TREE with sequential index

Many storage engines have been optimized on the basis of B Tree, adding pointers to adjacent leaf nodes, forming B TREE with sequential access pointers. Tree, this is done to improve the efficiency of interval search. As long as the first value is found, subsequent values ​​can be searched sequentially.

The structure of B Tree is as follows:

Detailed interpretation of mysql indexes and transactionsAfter analyzing the implementation principle of MySQL’s index structure, let’s take a look at the specific How does the storage engine implement the index structure? The two most common storage engines in MySQL are MyISAM and InnoDB, which implement non-clustered indexes and clustered indexes respectively.

First of all, we need to introduce a few concepts. In the classification of indexes, we can divide it into "primary index" and "auxiliary index" according to whether the key of the index is the primary key. The index established using the primary key value is called "Primary index", the others are called "auxiliary indexes". Therefore, there can only be one primary index, and there can be many auxiliary indexes.

MyISAM——Non-clustered index

The MyISAM storage engine uses a non-clustered index. The primary index and auxiliary index of the non-clustered index are almost the same, except that the primary index is not allowed to be repeated. , null values ​​are not allowed, and the keys of their leaf nodes store the physical address pointing to the data corresponding to the key value.

The data table and index table of non-clustered index are stored separately.

The data in the non-clustered index is saved according to the insertion order of the data. Therefore, non-clustered indexes are more suitable for queries of single data. Insertion order is not affected by key values.

FULLTEXT index can only be used in MyISAM.

At first I never understood why the auxiliary index is needed since the primary index and auxiliary index of the non-clustered index point to the same content. Later I realized that the index is not used for query. Use In those places, isn't it just after the WHERE and ORDER BY statements? So what if the query condition is not the primary key? At this time, an auxiliary index is needed.

InnoDB——Clustered Index

The leaf nodes of the primary index of the clustered index store the data corresponding to the key value itself, and the leaf nodes of the auxiliary index store the corresponding key value. The primary key value of the data. Therefore, the smaller the value length of the primary key, the better, and the simpler the type, the better.

Clustered index data and primary key index are stored together.

Clustered index data is saved in the order of the primary key. Therefore, it is suitable for interval search by primary key index, which can require less disk I/O and speed up the query. But also for this reason, the insertion order of the clustered index is best to be inserted in the monotonic order of the primary key, otherwise it will frequently cause page splits and seriously affect performance.

In InnoDB, if you only need to search for index columns, try not to add other columns, which will improve query efficiency.

使用主索引的时候,更适合使用聚簇索引,因为聚簇索引只需要查找一次,而非聚簇索引在查到数据的地址后,还要进行一次I/O查找数据。

因为聚簇辅助索引存储的是主键的键值,因此可以在数据行移动或者页分裂的时候降低委会成本,因为这时不用维护辅助索引。但是辅助索引会占用更多的空间。

聚簇索引在插入新数据的时候比非聚簇索引慢很多,因为插入新数据时需要减压主键是否重复,这需要遍历主索引的所有叶节点,而非聚簇索引的叶节点保存的是数据地址,占用空间少,因此分布集中,查询的时候I/O更少,但聚簇索引的主索引中存储的是数据本身,数据占用空间大,分布范围更大,可能占用好多的扇区,因此需要更多次I/O才能遍历完毕。

下图可以形象的说明聚簇索引和非聚簇索引的区别

Detailed interpretation of mysql indexes and transactions

五、索引的使用策略

什么时候要使用索引?

主键自动建立唯一索引;

经常作为查询条件在WHERE或者ORDER BY 语句中出现的列要建立索引;

作为排序的列要建立索引;

查询中与其他表关联的字段,外键关系建立索引

高并发条件下倾向组合索引;

什么时候不要使用索引?

经常增删改的列不要建立索引;

有大量重复的列不建立索引;

表记录太少不要建立索引;

在组合索引中不能有列的值为NULL,如果有,那么这一列对组合索引就是无效的;

在一个SELECT语句中,索引只能使用一次,如果在WHERE中使用了,那么在ORDER BY中就不要用了;

LIKE操作中,'%aaa%'不会使用索引,也就是索引会失效,但是‘aaa%’可以使用索引;

在索引的列上使用表达式或者函数会使索引失效,例如:select from users where YEAR(adddate) from users where adddate

在查询条件中使用正则表达式时,只有在搜索模板的第一个字符不是通配符的情况下才能使用索引。

在查询条件中使用会导致索引失效。

在查询条件中使用IS NULL会导致索引失效。

在查询条件中使用OR连接多个条件会导致索引失效,这时应该改为两次查询,然后用UNION ALL连接起来。

尽量不要包括多列排序,如果一定要,最好为这队列构建组合索引;

只有当数据库里已经有了足够多的测试数据时,它的性能测试结果才有实际参考价值。如果在测试数据库里只有几百条数据记录,它们往往在执行完第一条查询命令之后就被全部加载到内存里,这将使后续的查询命令都执行得非常快--不管有没有使用索引。只有当数据库里的记录超过了1000条、数据总量也超过了MySQL服务器上的内存总量时,数据库的性能测试结果才有意义。

六、索引的优化

1、最左前缀

索引的最左前缀和和B+Tree中的“最左前缀原理”有关,举例来说就是如果设置了组合索引那么以下3中情况可以使用索引:col1,,其它的列,比如,col2,col3等等都是不能使用索引的。

根据最左前缀原则,我们一般把排序分组频率最高的列放在最左边,以此类推。

2、带索引的模糊查询优化

在上面已经提到,使用LIKE进行模糊查询的时候,'%aaa%'不会使用索引,也就是索引会失效。如果是这种情况,只能使用全文索引来进行优化(上文有讲到)。

为检索的条件构建全文索引,然后使用

SELECT * FROM tablename MATCH(index_colum) ANGAINST(‘word’);

事务介绍

首先,什么是事务?事务就是一段sql 语句的批处理,但是这个批处理是一个atom(原子),不可分割,要么都执行,要么回滚(rollback)都不执行。

MySQL 事务主要用于处理操作量大,复杂度高的数据。比如说,在人员管理系统中,你删除一个人员,你即需要删除人员的基本资料,也要删除和该人员相关的信息,如信箱,文章等等,这样,这些数据库操作语句就构成一个事务!

  • 在 MySQL 中只有使用了 Innodb 数据库引擎的数据库或表才支持事务。

  • Transaction processing can be used to maintain the integrity of the database and ensure that batches of SQL statements are either all executed or not executed at all.

  • Transactions are used to manage insert, update, delete statements

Generally speaking, transactions must meet four conditions (ACID): Atomicity (Atomicity), Consistency (stability), Isolation (isolation), Durability (reliability)

  • 1. Atomicity of transactions: a set of transactions either succeeds or is withdrawn .

  • 2. Stability: If there is illegal data (foreign key constraints and the like), the transaction will be withdrawn.

  • 3. Isolation: Transactions run independently. If the result of one transaction affects other transactions, the other transactions will be withdrawn. 100% isolation of transactions requires sacrificing speed.

  • 4. Reliability: After the software or hardware crashes, the InnoDB data table driver will use the log file to reconstruct and modify it. Reliability and high speed are incompatible. The innodb_flush_log_at_trx_commit option determines when to save transactions to the log.

Detailed interpretation of mysql indexes and transactions

Transaction concurrency does not perform dirty reads, phantom reads, and non-repeatable reads caused by transaction isolation

  • Dirty read: Transaction A reads the data modified by uncommitted transaction B. If transaction B fails to roll back midway, then transaction A reads dirty data at this time. For example, transaction A modifies money. At this time, transaction B reads the update results of transaction A. However, if transaction A rolls back later, what transaction B reads is dirty data.

  • Non-repeatable read: In the same transaction, the results of reading the same data are inconsistent. Transaction A reads before transaction B updates the data, then transaction B updates and commits, and transaction A reads again. At this time, the data read twice is different.

  • Phantom reading: (In the same transaction, the same query returns different results multiple times. Transaction B queries the number of records in the table, and then transaction A inserts a record into the table, and then Transaction B queried again and found that the number of records was different. Note that this explanation is incorrect. There are many such explanations on the Internet, including experts who I think are more authoritative, but they were found to be incorrect after experiments. So this is something to pay attention to). You can do an experiment like this. Transaction A queries the number of records, transaction B inserts a record (the primary key value is 6), commits, and then transaction A queries the number of records and finds that the number of records has not changed, but at this time a record with a primary key value of 6 is inserted. The records were found to be conflicting, and it felt like an hallucination.

Difference

1. Dirty read and non-repeatable read: Dirty read means that the transaction reads the updated data of the uncommitted transaction. Non-repeatable read means that the data read several times in the same transaction is different.

2. The difference between non-repeatable reading and phantom reading: they are both in the same transaction. The former is different in reading data several times, and the latter is reading data in different ways.

Isolation level

Detailed interpretation of mysql indexes and transactions

Detailed interpretation of mysql indexes and transactions


  • ##Isolation level changes affect the lock cycle

  • mysql supports the above four isolation levels, and the default is repeatable read

Detailed interpretation of mysql indexes and transactions

Detailed interpretation of mysql indexes and transactions##MySQL has three levels of locks: page level, table level, row level.

The MyISAM and MEMORY storage engines use table-level locking;

The BDB storage engine uses page-level locking, but also supports table-level locking. Lock;

The InnoDB storage engine supports both row-level locking and table-level locking, but by default

uses row-level locking.

The characteristics of these three locks in MySQL can be roughly summarized as follows: 1. Table-level locks: low overhead, fast locking; no deadlocks; large locking granularity, the highest probability of lock conflicts, and high concurrency lowest. Table-level locks allow multiple threads to read data from the data table at the same time, but if another thread wants to write data, it must first obtain exclusive access (exclusive table lock is added by default); (Shared Read Lock (Table Read Lock) ) When updating data, you must wait until the update is completed before other threads can access (read) the table. (Exclusive write lock (Table Write Lock))

2. Row-level lock: high overhead, slow locking ; Deadlock will occur; the locking granularity is the smallest, the probability of lock conflict is the lowest, and the concurrency is the highest.

3、页面锁:开销和加锁时间界于表锁和行锁之间;会出现死锁;锁定粒度界于表锁和行锁之间,并发度一般。

原则上数据表有一个读锁时,其它进程无法对此表进行更新操作,但在一定条件下,MyISAM表也支持查询和插入操作的并发进行。

一般MyISAM引擎的表也支持查询和插入操作的并发进行(原则上数据表有一个读锁时,其它进程无法对此表进行更新操作)

MyISAM引擎有一个系统变量concurrent_insert,专门用以控制其并发插入的行为,其值分别可以为0、1或2:

a、concurrent_insert为0,不允许并发插入。     
b、concurrent_insert为1,如果MyISAM表中没有空洞(即表的中间没有被删除的行),MyISAM允许在一个进程读表的同时,另一个进程从表尾插入记录。这也是MySQL的默认设置。     
c、concurrent_insert为2,无论MyISAM表中有没有空洞,都允许在表尾并发插入记录。

如果有读写请求同时进行的话,MYSQL将会优先执行写操作。这样MyISAM表在进行大量的更新操作时(特别是更新的字段中存在索引的情况下),会造成查询操作很难获得读锁,从而导致查询阻塞。

我们还可以调整MyISAM读写的优先级别:

  a、通过指定启动参数low-priority-updates,使MyISAM引擎默认给予读请求以优先的权利。
  b、通过执行命令SET LOW_PRIORITY_UPDATES=1,使该连接发出的更新请求优先级降低。
  c、通过指定INSERT、UPDATE、DELETE语句的LOW_PRIORITY属性,降低该语句的优先级。

MyISAM使用的是 flock 类的函数,直接就是对整个文件进行锁定(叫做文件锁定),MyISAM的数据表是按照单个文件存储的,可以针对单个表文件进行锁定;

InnoDB使用的是 fcntl 类的函数,可以对文件中局部数据进行锁定(叫做行锁定),InnoDB是一整个文件,把索引、数据、结构全部保存在 ibdata 文件里,所以必须用行锁定。

事物控制语句:

BEGIN或START TRANSACTION;显式地开启一个事务;     
COMMIT;也可以使用COMMIT WORK,不过二者是等价的。
COMMIT会提交事务,并使已对数据库进行的所有修改称为永久性的;      
ROLLBACK;有可以使用ROLLBACK WORK,不过二者是等价的。回滚会结束用户的事务,并撤销正在进行的所有未提交的修改;      
SAVEPOINT identifier;SAVEPOINT允许在事务中创建一个保存点,一个事务中可以有多个SAVEPOINT;     
RELEASE SAVEPOINT identifier;删除一个事务的保存点,当没有指定的保存点时,执行该语句会抛出一个异常;     
ROLLBACK TO identifier;把事务回滚到标记点;     
SET TRANSACTION;用来设置事务的隔离级别。
InnoDB存储引擎提供事务的隔离级别有READ UNCOMMITTED、READ COMMITTED、REPEATABLE READ和SERIALIZABLE。

MYSQL 事务处理主要有两种方法:

1、用 BEGIN, ROLLBACK, COMMIT来实现

BEGIN 开始一个事务     
ROLLBACK 事务回滚    
COMMIT 事务确认

2、直接用 SET 来改变 My

SQL 的自动提交模式:

SET AUTOCOMMIT=0 禁止自动提交     
SET AUTOCOMMIT=1 开启自动提交

注意点

1、如果事务中sql正确运行,后面没有commit,结果是不会更新到数据库的,所以需要手动添加commit。

2、如果事务中部分sql语句出现错误,那么错误语句后面不会执行。而我们可能会认为正确操作会回滚撤销,但是实际上并没有撤销正确的操作,此时如果再无错情况下进行一次commit,之前的正确操作会生效,数据库会进行更新。


The above is the detailed content of Detailed interpretation of mysql indexes and transactions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete
Previous article:What is Codd's rule?Next article:What is Codd's rule?