This article brings you relevant knowledge about mysql, which mainly introduces some summaries of interview questions for database manufacturers. Let’s take a look at them together. I hope it will be helpful to everyone.
Recommended learning: mysql video tutorial
First normal form (1NF): means that the columns of the table cannot be further divided. Each column of the table in the database is an indivisible basic data item. There cannot be multiple values in the same column;
- Second Normal Form (2NF): Based on 1NF, it also includes two parts: first, the table must have a primary key; second, the non-primary key columns in the table must completely depend on the primary key, and cannot only rely on the primary key. Part;
- Third Normal Form (3NF): Based on 2NF, the transitive dependence of non-primary key columns on the primary key is eliminated. Non-primary key columns must directly depend on the primary key.
- BC Normal Form (BCNF): Based on 3NF, eliminate the transitive dependence of the main attribute on the code part
2.1. Client database driver and database connection pool:
(1) Before the client communicates with the database, establish a connection with MySQL through the database driver , after the establishment is completed, send the SQL statement (2) In order to reduce the degradation of system performance caused by frequent creation and destruction of connections, a certain number of connection threads are maintained through the database connection pool. When a connection is required, directly Get it from the connection pool and return it to the connection pool after use. Common database connection pools include Druid, C3P0, DBCP2.2. The execution process of the Server layer of the MySQL architecture:
(1) Connector: Mainly responsible for communicating with customers The client establishes a connection, obtains permissions, maintains and manages the connection(2) Query cache: Query in the cache first, and return directly if found. If the query cannot be found in the cache, query in the database.MySQL cache is turned off by default, which means that the use of cache is not recommended, and the entire query cache function has been deleted in MySQL8.0 version. This is mainly caused by the limitations of its usage scenarios:Let’s first talk about the data storage format in the cache: key (sql statement) - value (data value), so if the SQL statement (key) only exists A little difference will lead to direct database query;
- Since the data in the table is not static, most of it changes frequently, and when the data in the database changes, it will be related to this table accordingly. The cached data of Obtain the abstract syntax tree, and then use the preprocessor to perform semantic verification on the abstract syntax tree to determine whether the table in the abstract syntax tree exists. If it exists, then determine whether the select projection column field exists in the table, etc.
- (4) Optimizer: It mainly uses the syntax tree obtained after lexical analysis and syntax analysis of SQL, through the contents of the data dictionary and statistical information, and then through a series of operations to finally obtain an execution plan, including Choosing which index to use
When analyzing whether to use an index query, it is determined by
dynamic data sampling statistical analysis; as long as it is statistically analyzed, it may
There are analysis errors, so when executing SQL without indexing, this factor must also be taken into consideration(5) Executor: Called according to a series of execution plans The API interface provided by the storage engine is used to call operation data and complete the execution of SQL.
2.3. Execution process of Innodb storage engine:
- (1) First, the MySQL executor calls the API of the storage engine to query data according to the execution plan
- (2) The storage engine first queries the data from the buffer pool. If not, It will query the disk, and if it is queried, it will be put into the buffer pool
- (3) While the data is loaded into the Buffer Pool, the original record of this data will be saved to the undo log file
- (4) innodb will perform the update operation in the Buffer Pool
- (5) The updated data will be recorded in the redo log buffer
- (6) The submitted transaction is submitted At the same time, it will do the following three things
- (7)(The first thing) Flush the data in the redo log buffer into the redo log file
- (8)(The second thing) Things) Write this operation record into the bin log file
- (9) (The third thing) Record the name of the bin log file and the location of the updated content in the bin log into the redo log, and at the same time Add the commit mark
- at the end of the redo log (10) Use a background thread, which will flush the updated data in our Buffer Pool to the MySQL database at a certain opportunity, thus combining the memory and database The data remains unified
The storage engine is a component that performs actual operations on the underlying physical data and provides various APIs for operating data for the Server service layer. Commonly used storage engines include InnoDB, MyISAM, and Memory. Here we mainly introduce the difference between InnoDB and MyISAM:
(1) Transaction: MyISAM does not support transactions, InnoDB supports transactions
(2) Lock level: MyISAM only supports table-level locks, InnoDB does Row-level locks and table-level locks, row-level locks are used by default, but row locks will only be used when data is queried through indexes, otherwise table locks will be used. Row-level locks consume more resources than table locks in each operation of acquiring and releasing locks. There may be a deadlock when using row locks, but there is no deadlock with table-level locks
(3) Primary keys and foreign keys: MyISAM allows tables without any indexes and primary keys to exist, and does not support foreign keys. InnoDB's primary key cannot be empty and supports primary key auto-growth. If the primary key or non-empty unique index is not set, a 6-byte primary key will be automatically generated and supports foreign key integrity constraints
(4) Index Structure: Both MyISAM and InnoDB use B-tree indexes. The Data fields of MyISAM's primary key index and auxiliary index are the addresses where row data records are saved. However, the Data field of InnoDB's primary key index saves not the address of the row data record, but all the data content of the row, while the Data field of the auxiliary index saves the value of the primary index.
Since InnoDB's auxiliary index saves the value of the primary key index, using the auxiliary index requires retrieving the index twice: first, retrieve the auxiliary index to obtain the primary key, and then use the primary key to retrieve the records in the primary index. This is why it is not recommended to use too long fields as primary keys: since the auxiliary index contains the primary key column, if the primary key uses too long fields, it will cause other auxiliary indexes to become larger, so try to define the primary key as small as possible. Smaller.
(5) Full-text index: MyISAM supports full-text index. InnoDB does not support full-text index before version 5.6. Version 5.6 and later versions begin to support full-text index
(6) Table The specific number of rows:
(7) Storage structure:
(8) Storage space:
(9) Applicable scenarios:
Note: The MyISAM storage engine has been abandoned in mysql8.0 version
Database transaction is the basic unit of concurrency control, which refers to a logical set of operations, either all of them are executed or none of them are executed.
4.1. Transaction ACID:
- (1) Atomicity: A transaction is an indivisible unit of work. All operations in the transaction either succeed or fail. If the transaction fails, it needs to be rolled back.
- (2) Isolation: The degree to which the data operated by a transaction is visible to other transactions before it is submitted.
- (3) Persistence: Once a transaction is committed, its changes to the data in the database are permanent.
- (4) Consistency: Transactions cannot destroy data integrity and business consistency. For example, when transferring money, whether the transaction succeeds or fails, the total amount of money between both parties remains unchanged.
4.2. ACID implementation principle:
4.2.1. Atomicity: Atomicity is through MySQL’s rollback log undo log来
Implemented: When a transaction modifies the database, InnoDB will generate the corresponding undo log; if the transaction execution fails or rollback is called, causing the transaction to be rolled back, the information in the undo log can be used to transfer the data Roll back to the way it was before modifications.
4.2.2. Isolation:
(1) Transaction isolation level:
To ensure that in a concurrent environment The database provides four transaction isolation levels to ensure the integrity and consistency of read data. The higher the isolation level, the better the integrity and consistency of the data can be guaranteed, but the greater the impact on high concurrency performance and the lower the execution efficiency. . (The four isolation levels increase from top to bottom)
- Read uncommitted: Allows a transaction to read uncommitted data of other transactions during execution;
- Read committed: Allow transactions to read data submitted by other transactions during execution;
- Repeatable read (default level): Within the same transaction, the query results at any time are consistent;
- Read serialization: All transactions are executed one by one. Each read needs to obtain a table-level shared lock, and reading and writing will block each other.
(2) Transaction concurrency issues:
If the isolation of transactions is not considered, there may be problems in a transaction concurrency environment There are:
- Lost updates: When two or more transactions operate on the same data and then update the row based on the selected value, because each transaction is unaware of the existence of the other transactions , a lost update problem occurs: the last update overwrites updates made by other transactions.
- Dirty read: means that transaction A is accessing data and has modified the data (the transaction is not committed). At this time, transaction B also uses this data. Later, transaction A cancels the rollback and changes the modified data. When the data is restored to its original value, the data read by B is inconsistent with the data in the database, that is, the data read by B is dirty data.
- Non-repeatable reading: Within a transaction, the same data is read multiple times, but because another transaction modified and committed the data during this period, the data read before and after is inconsistent;
- Phantom reading: In a transaction, the same data (usually a range query) is read twice, but because another transaction adds or deletes data, the results of the two times are inconsistent.
Different transaction isolation levels will have different concurrency problems in the concurrent environment:
(3) Transaction Implementation principle of isolation:
The isolation level of Innodb transactions is implemented by MVVC and lock mechanism:
① MVCC (Multi-Version Concurrency Control, multi-version concurrency control) is A specific way for MySQL's InnoDB storage engine to implement transaction isolation levels is to implement two isolation levels: read committed and repeatable read. The read uncommitted isolation level always reads the latest data row without using MVCC. The read serialization isolation level requires locking all read rows, which cannot be achieved simply using MVCC.
MVCC is implemented by saving two hidden columns behind each row of records, one saving the transaction ID of the row, and the other saving the rollback segment pointer of the row. Every time a new transaction is started, a new transaction ID will be automatically incremented. When a transaction starts, the transaction ID will be placed in the transaction ID field of the row affected by the current transaction, and the pointer of the rollback segment contains all version data on the row record, which is organized in the form of a linked list in the undo log rollback log, that is Said that the value actually points to the history linked list of the row in the undo log.
When accessing the database concurrently, MVCC multi-version management is performed on the data in the transaction to avoid writing operations blocking read operations, and the phantom reading problem of snapshot reading can be solved by comparing versions, but for the current MVCC cannot solve phantom reads and needs to be solved through temporary key locks.
② Lock mechanism:
The basic working principle of the MySQL lock mechanism is: before a transaction modifies the database, it needs to obtain the corresponding lock. Only the transaction that obtains the lock can modify the data; in this transaction During the operation, this part of the data is locked. If other transactions need to modify the data, they need to wait for the current transaction to commit or roll back to release the lock.
- Exclusive lock solves dirty reads
- Shared lock solves non-repeatable reads
- Pro-key lock solves phantom reads
4.2.3. Persistence:
Persistence relies on redo log to achieve ,
will save the executed SQL statement to a redo log file when executing SQL, but in order to improve efficiency , before writing data to the redo log, it will first be written to the redo log buffer cache in memory. The writing process is as follows: When writing data to the database, the execution process will first write to the redo log buffer. The modified data in the redo log buffer will be regularly refreshed to the redo log file on the disk. This process is called disk flushing (i.e. The redo log buffer writes the log to the redo log file on the disk).
The use of redo log buffer can greatly improve the efficiency of reading and writing data, but it also brings new problems: if MySQL goes down, the data modified in the redo log buffer is still in the memory. Failure to flush to disk will result in data loss, and transaction durability cannot be guaranteed. In order to ensure the durability of the transaction, when the transaction is committed, the fsync interface will be called to flush the redo log. The refresh frequency is controlled by the innodb_flush_log_at_trx_commit variable:
- 0: means not to flush the disk;
- 1: Every time a transaction is submitted, the data in the buffer pool is flushed to the disk;
- 2: When a transaction is submitted, the data in the buffer pool is written to the disk file Go to the corresponding os cache cache instead of directly entering the disk file. It may take 1 second before the data in the os cache is written to the disk file.
4.2.4. Consistency:
Consistency means that the transaction cannot destroy the integrity of the data and the consistency of the business:
Data integrity: entity integrity, column integrity (such as field type, size, length must meet requirements), foreign key constraints, etc.
Business consistency: For example, in a bank transfer, whether the transaction succeeds or fails, the total amount of money between both parties remains unchanged.
When multiple transactions in the database concurrently access the same data, if concurrent operations are not controlled, incorrect data may be read and stored, destroying the consistency of the database. The basic working principle of the MySQL lock mechanism is that before a transaction modifies the database, it needs to obtain the corresponding lock. Only the transaction that obtains the lock can modify the data; during the transaction operation, this part of the data is locked. If other transactions If data needs to be modified, the lock needs to be released after the current transaction is committed or rolled back.
According to different classification methods, the types of locks can be divided into the following types:
- According to the granularity of the lock: table-level locks, row-level locks , Page-level lock;
- Divided by lock type: shared (lock S lock), exclusive lock (X lock);
- Divided by lock usage strategy: optimistic lock, pessimistic lock;
5.1, table-level locks, row-level locks, page-level locks:
- Table-level locks: Maximum granularity The lock level has the highest probability of lock conflict and the lowest concurrency, but the overhead is small, the lock is fast, and deadlock will not occur;
- Row-level lock: The minimum granularity of all levels, the probability of lock conflict The smallest, highest degree of concurrency, but high overhead, slow locking, and deadlock will occur;
- Page-level lock: The lock granularity is bounded between table-level locks and row-level locks. The lock is a compromise and the concurrency is average. The overhead and locking time are also bounded between table locks and row locks, and deadlocks will occur;
Different storage engines support different lock mechanisms:
- InnoDB storage engine supports row-level locks and table-level locks. By default, row-level locks are used, but row-level locks are only used when data is queried through indexes. Otherwise, table-level locks are used.
- MyISAM and MEMORY storage engines use table-level locks;
- The BDB storage engine uses page locks, but also supports table-level locks;
5.2. InnoDB row locks:
InnoDB row locks have two types:
For update, delete, and insert operations, InnoDB will automatically add exclusive locks to the data rows involved; for ordinary SELECT statements, InnoDB will not add any locks.
5.3. InnoDB table locks and intention locks:
Because the InnoDB engine allows row locks and table locks to coexist, achieving a multi-granularity locking mechanism, but Although table locks and row locks have different locking scopes, they will conflict with each other. When you want to add a table lock, you must first traverse all the records in the table to determine whether there is an exclusive lock. This traversal check method is obviously an inefficient way. MySQL introduces intention locks to detect conflicts between table locks and row locks.
Intention locks are also table-level locks, divided into read intention locks (IS locks) and write intention locks (IX locks). When a transaction wants to add a row lock to a record, it first adds the corresponding intention lock to the table. If a transaction wants to lock the table later, it only needs to first determine whether the intended lock exists. If it exists, it can quickly return to the table. Table lock cannot be enabled, otherwise it will need to wait to improve efficiency.
5.4. Implementation of InnoDB row locks and temporary key locks:
InnoDB row locks are implemented by locking index entries on the index. . Row locks can only be used when data is retrieved through the index, otherwise table locks will be used.
In InnoDB, in order to solve the phenomenon of phantom reading, the next-key lock (next-key) is introduced. According to the index, it is divided into intervals with left open and right closed. When performing a range query, if the index is hit and the data can be retrieved, the interval where the record is located and its next interval are locked. In fact, Next-Key = Record Locks Gap Locks
- Gap Locks: When using range queries instead of precise queries to retrieve data, When requesting a shared or exclusive lock, InnoDB will lock the index entries of existing data records that meet the range conditions; records whose key values are within the condition range but do not exist are called gaps (GAP).
- Record lock: When using a unique index and accurate query of record existence, use record lock
##5.5. Use the lock mechanism to solve concurrency problems:
For details on the locking mechanism of the InnoDB storage engine and the locking mechanism of the MyISAM storage engine, you can read this article: MySQL Database: Locking Mechanism_Zhang Weipeng's Blog-CSDN Blog_Locking Mechanism in the DatabaseX lock solves dirty reads
- S lock solves non-repeatable reads
- Pro-key lock solves phantom reads
The index is essentially a data structure that speeds up query performance by reducing the number of rows that need to be traversed by the query, Prevent the database from performing a full table scan, just like the table of contents of a book, allowing you to find content faster. (A table can have up to 16 indexes)
6.1. Advantages and disadvantages of indexes:
(1) Advantages of indexes:Reduce the number of rows that need to be retrieved by the query, speed up the query, and avoid full table scans. This is also the main reason for creating an index.
(2) Disadvantages of index:- If the data structure of the index is a B-tree, when using grouping and sorting, the time of grouping and sorting in the query can be significantly reduced.
- By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
When the data in the table is added, deleted and modified, the index must also be updated, and the maintenance time will increase accordingly. increases with the amount of data.
- Indexes need to occupy physical space. If you want to create a clustered index, the space required will be larger.
6.2. Index usage scenarios:
(1) On which columns to create an index:Create indexes on columns that frequently appear in the WHERE clause to speed up the judgment of conditions.
(2) Which columns should not be indexed?- Columns accessed by range or columns used in group by or order by, because the index has been sorted, the index can be used to speed up the sorting query time.
- is often used on the columns of the connection. These columns are mainly foreign keys, which can speed up the connection;
- is used as the primary key column to enforce the uniqueness of the column and the organization table. Arrangement structure of data;
Column with low distinction. Since these columns have very few values, such as gender, in the query results, the data rows in the result set account for a large proportion of the data rows in the table, that is, a large proportion of the data rows need to be searched in the table. Increasing the index does not significantly speed up retrieval.
- Indices should not be created on columns that are rarely included in the query. Since these columns are rarely used, adding indexes will actually reduce the system maintenance speed and increase space requirements.
- When adding an index causes the increase in modification cost to be far greater than the improvement in retrieval performance, the index should not be created. When adding indexes, retrieval performance will be improved, but modification performance will be reduced. When reducing indexes, modification performance will increase and retrieval performance will decrease.
- Columns defined as text, image and bit data types should not be indexed. The data volume of these columns is either quite large or has very few values.
6.3. Index classification:
(1) Ordinary index, unique index, primary key index, full-text index, combined index.
- Ordinary index: The most basic index, without any restrictions
- Unique index: But the value of the index column must be unique, null values are allowed, and there can be multiple NULL values. In the case of a composite index, the combination of column values must be unique.
- Primary key index: a special unique index that does not allow null values.
- Full-text index: Full-text index can only be used for MyISAM tables, and only supports CHAR, VARCHAR or TEXT types. It is used to replace less efficient like fuzzy matching operations, and can be combined with multiple fields at once through full-text indexing. Full fuzzy matching of multiple fields.
- Combined index: Mainly to improve the efficiency of mysql, when creating a composite index, the columns most commonly used as restrictive conditions should be placed on the far left, in descending order.
(2) Clustered index and non-clustered index:
If classified according to the physical order of data storage and the order of index values, the index can be divided into clustered index and non-clustered index. There are two types of clustered indexes and non-clustered indexes:
- Clustered index: The physical order of data storage in the table is consistent with the order of index values. A basic table can only have one cluster at most. Index, when updating data on a clustered index column, it often leads to changes in the physical order of records in the table, which is costly. Therefore, it is not appropriate to establish a clustered index for frequently updated columns.
- Non-clustered index: table In an index organization where the physical order of data is inconsistent with the order of index values, a basic table can have multiple clustered indexes.
6.4. Index data structure:
Common index data structures include: B Tree, Hash index.
(1) Hash index: Only the Memory storage engine in MySQL supports hash index, which is the default index type of the Memory table. The hash index organizes the data in the form of hash values, so the query efficiency is very high and can be located at one time.
Disadvantages of hash index:
- Hash index can only satisfy equal value query, but cannot satisfy range query and sorting. Because after the data passes through the Hash algorithm, its size relationship may change.
- When creating a composite index, you cannot use only some columns of the composite index for query. Because the hash index combines multiple column data and then calculates the Hash value, it is meaningless to calculate the Hash value for individual column data.
- When a Hash collision occurs, the Hash index cannot avoid scanning of table data. Because it is not enough to simply compare the Hash value, you need to compare the actual value to determine whether it meets the requirements.
(2) B Tree index: B Tree is the most frequently used index data structure in mysql. It is the index type of Innodb and Myisam storage engine modes. The B Tree index requires multiple IO operations from the root node to the leaf node during search. The query speed is not as good as the Hash index, but it is more suitable for operations such as sorting.
Advantages of B Tree index:
- Nodes within the page do not store content, more rows can be read per IO, greatly reducing disk I/O reads Times
- B Tree with sequential access pointers: All index data of B Tree are stored on leaf nodes, and sequential access pointers are added. Each leaf node has a pointer to the adjacent leaf node, so This is done to improve interval query efficiency.
6.5. Why use B Tree as an index:
The index itself is also very large and it is impossible to store it all in memory, so Indexes are often stored on disk in the form of index files. In this case, disk I/O consumption will be generated during the index search process. Compared with memory access, the consumption of disk I/O access is several orders of magnitude higher, so it is best to evaluate the merits of a data structure as an index. The important metric is the asymptotic complexity of the number of disk I/O operations during the search process. In other words, The data structure of the index should minimize the number of disk I/O accesses during the search process.
(1) Locality principle and program pre-reading:
Since the disk itself is much slower to access than the main memory, coupled with the consumption of mechanical movement, in order to improve efficiency, Disk I/O should be minimized. In order to achieve this goal, the disk often does not read strictly on demand, but reads in advance every time. Even if only one byte is needed, the disk will start from this position and sequentially read a certain length of data backwards into the memory. The theoretical basis for this is the famous locality principle in computer science: when a piece of data is used, nearby data is usually used immediately. The data required during program execution is usually concentrated.
Sincedisk sequential reads are very efficient (no seek time, very little spin time), read-ahead can be used for programs with locality Improve I/O efficiency. The length of read-ahead is generally an integral multiple of the page. When the data to be read by the program is not in the main memory, a page fault exception will be triggered. At this time, the system will send a read signal to the disk, and the disk will find the starting position of the data and read one or more pages backwards. Load into memory, then return abnormally, and the program continues to run.
(2) Performance analysis of B Tree index:As mentioned above, disk I/O times are generally used to evaluate the quality of the index structure. Let's start with the B-tree analysis. A B-tree retrieval requires accessing up to h nodes at a time. At the same time, the database cleverly uses the principle of disk read-ahead to set the size of a node equal to one page, that is, each time a new node is created, a direct application is made. Page space, this ensures that a node is physically stored in a page, and the computer storage allocation is page-aligned, so that each node can be fully loaded with only one I/O. A retrieval in the B-tree requires at most h-1 I/O (the root node is resident in memory), and the time complexity is O(h)=O(logdN). In general practical applications, the out-degree d is a very large number, usually more than 100, so h is very small. To sum up, using B-tree as an index structure is very efficient.
As for the red-black tree structure, although the time complexity is also O(h), h is obviously much deeper, and because nodes that are logically close may be physically far away, it cannot Taking advantage of locality, the IO efficiency is obviously much worse than that of B-tree.
In addition, B Tree is more suitable as an index data structure, and the reason is related to the out-degree d of the internal node. From the above analysis, we can see that the larger d, the better the index performance, and the upper limit of the out-degree d depends on the size of the key and data in the node. Since the data field is removed from the node in the B Tree, it can have a larger out-degree. , the number of disk IOs will be less.
(3) Comparison between B-tree index and B-tree index?
According to the structures of B-Tree and B Tree, we can find that B-tree has more advantages than B-tree in file systems or database systems for the following reasons:
- (1) B-tree is conducive to database scanning: B-tree improves disk IO performance but does not solve the problem of inefficiency of element traversal, while B-tree only needs to traverse leaf nodes to solve all problems. Scanning of keyword information, so B-tree has higher performance for operations such as range query and sorting.
- (2) The disk IO cost of B-tree is lower: The data field of the internal node of B-tree does not store data, so its internal node is smaller than that of B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time, and the number of I/O reads and writes is relatively reduced.
- (3) The query efficiency of B-tree is more stable: because the internal nodes of B-tree are only indexes of keywords in leaf nodes and do not store data. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in equal query efficiency for each data.
(4) Implementation of B Tree index in MySQL's InnoDB and MyISAM storage engines?
Both MyISAM and InnoDB use B-tree indexes. The Data fields of MyISAM's primary key index and auxiliary index save the address of the row, but InnoDB's primary key index saves not the address of the row, but the row. All data, while the Data field of the auxiliary index saves the value of the primary index.
Index length limit:
- For Innodb’s combined index, if the length of each column exceeds 767 bytes, the columns exceeding 767 bytes will be Take the prefix index; for Innodb's single column index, if the column length exceeds 767, take the prefix index (take the first 255 characters)
- For MyISAM's combined index, the total length of the created index cannot exceed 1000 bytes , otherwise an error will be reported and the creation will fail; for MyISAM's single-column index, the maximum length cannot exceed 1000, otherwise an alarm will be issued, but the creation is successful, and the final creation is a prefix index (the first 333 characters are taken)
(1) SQL optimization and index optimization of MySQL: https://blog.csdn.net/a745233700 /article/details/84455241
(2) MySQL table structure optimization: https://blog.csdn.net/a745233700/article/details/84405087
MySQL is an IO-intensive application, and its main responsibility is data management and storage. And we know that the time to read a database from memory is at the microsecond level, while the time to read an IO from an ordinary hard disk is at the millisecond level. The difference between the two is 3 orders of magnitude. Therefore, to optimize the database, the first step that needs to be optimized is IO, and convert disk IO into memory IO as much as possible. Therefore, when optimizing the parameters of the MySQL database, we mainly optimize parameters that reduce disk IO: for example, use query_cache_size to adjust the size of the query cache, and use innodb_buffer_pool_size to adjust the size of the buffer;
The execution plan is a query made from the abstract syntax tree and statistical information of related tables obtained by the SQL statement after passing through the query analyzer. Plan, This plan is automatically analyzed and generated by the query optimizer. Since it is the result of dynamic data sampling and statistical analysis, there may be analysis errors, that is, the execution plan is not optimal. Use the explain keyword to know how MySQL executes SQL query statements, analyze the performance bottlenecks of select statements, and improve our queries. The explain results are as follows:
The important ones are id, type, key, key_len, rows, extra: (1) id: The id column can be understood as the identifier of the SQL execution order. There are several ids as there are several selects.(2) select_type: The type of query, mainly used to distinguish complex queries such as ordinary queries, joint queries, subqueries, etc.; (3) table: representation Explain which table a row is accessing (4) type: access type, that is, MySQL decides how to find rows in the table. From best to worst: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL, except all, other type types All types can use indexes. Except for index_merge, other types can only use one index. Generally, type is required to be at ref level, and range search needs to reach range level.Different id values: the larger the id value, the higher the priority and will be executed first;
- The same id value: executed sequentially from top to bottom;
- The id column is null: indicating that this is a result set and there is no need to use it for query.
(5) possible_keys: Indexes that may be used in queries (6) key: Which index is actually used to optimize access to the table (7) key_len: Actually used The index length for optimizing queries, that is, the number of bytes used in the index. Through this value, you can calculate which fields of the index are actually used in a multi-column index. (8)ref: Displays which field or constant is used together with key(9)rows: Based on the table statistics and index selection, roughly estimate the amount of data that the query here needs to read The number of rows, not the exact value. (10)extra: some other additional informationsystem: There is only one data match in the table (equal to the system table), which can be regarded as a special case of const type
- const: It is found through the index once, indicating the use Primary key index or unique index
- eq_ref: The fields in the primary key or unique index are used for connection, and only one row of matching data will be returned
- ref: Ordinary index scan, may return multiple matching data Query condition rows.
- fulltext: Full-text index retrieval, the priority of the full-text index is very high. If the full-text index and the ordinary index exist at the same time, mysql will give priority to the full-text index regardless of the cost.
- ref_or_null: Similar to the ref method, except that the comparison of null values is added.
- index_merge: Indicates that the query uses more than two indexes, the optimization method of index merging, and finally takes the intersection or union. Common and and or conditions use different indexes.
- unique_subquery: used for in-form subquery in where, the subquery returns a unique value without repeating values;
- index_subquery: used for in-form subquery using auxiliary index or in constant list, Subqueries may return duplicate values, and indexes can be used to deduplicate subqueries.
- range: Index range scan, commonly used in queries using operators such as >, <, between, in, like, etc.
- index: Index full table scan, scan the index tree from beginning to end;
- all: Traverse the entire table to find matching rows (Although Index and ALL both read the entire table, index is read from the index, while ALL is read from the hard disk)
- NULL: MySQL decomposes the statement during the optimization process, and does not even need to access the table or index during execution
Readers who are interested in explaining the execution plan details can read this article: https://blog.csdn.net/a745233700/article/details/84335453using index: using covering index
- using index condition: the queried column is not indexed Coverage, where filter condition uses index
- using temporary: Use temporary table to save intermediate results, often used in group by and order by operations, usually because there is no index on the group by column, it may also be because at the same time There are group by and order by, but the columns of group by and order by are different. Generally speaking, it means that the query needs to be optimized.
- using filesort: MySQL has two ways to sort the query results, one is One is to use indexes, and the other is filesort (an external sort based on quick sort, with poor performance). When the amount of data is large, this will be a CPU-intensive process, so sorting can be optimized by establishing appropriate indexes. Performance
10.1. The principle of MySQL master-slave replication:
Slave obtains the binlog binary log file from the Master, and then parses the log file into the corresponding SQL statement to re-execute the operation of the master server on the slave server. In this way, the consistency of the data is ensured. Since the master-slave replication process is asynchronous, there may be a delay in the data between the Slave and the Master, and only the final consistency of the data can be guaranteed. The entire replication process between master and slave is mainly completed by three threads:
- (1) Slave SQL thread thread: created to read the relay log relay Log and execute the updates contained in the log, located on the slave side
- (2) Slave I/O thread: Read the content sent by the master server Binlog Dump thread and save it to the slave server's relay log relay log , located on the slave side:
- (3) Binlog dump thread thread (also called IO thread): Send the contents of the bin-log binary log to the slave server, located on the master side
Note: If a master server is equipped with two slave servers, there will be two Binlog dump threads on the master server, and each slave server will have two threads;
10.2. Master-slave replication process:
- # (1) After the master server executes the SQL statement, it is recorded in the binlog binary file;
- (2) The IO thread on the slave side connects to the master side and requests the next log content to be copied from the specified pos node position of the specified bin log log file (or from the very beginning of the log).
- (3) After receiving the IO thread request from the slave side, the master side notifies the IO thread responsible for the replication process, and reads the specified binlog log after the specified pos node position according to the request information of the slave side IO thread. The log information is then returned to the IO thread on the slave side. In addition to the information contained in the binlog log, the returned information also includes the binlog file name of the returned information on the master side and the position of the POS node in the binlog log.
- (4) After receiving the information returned by the master-side IO, the IO thread on the slave side writes the received binlog log content to the end of the relay log file on the slave side in turn, and reads the The binlog file name and pos node location on the master side are recorded in the master-info file (this file is stored on the slave side), so that the master can be told from which position to start data synchronization during the next synchronization;
- (5) slave After the SQL thread on the master side detects the new content in the relay log file, it immediately parses the content in the relay log file, then restores it to the SQL statements actually executed on the master side, and then executes these SQL statements in order to achieve Data consistency between master and slave;
10.3. Benefits of master-slave replication:
- (1) Read and write separation improves database performance by dynamically adding slave servers, performing writes and updates on the master server, and performing read functions on the slave server.
- (2) Improve data security. Because the data has been copied to the slave server, the slave server can terminate the replication process. Therefore, it can be backed up on the slave server without destroying the corresponding data of the master server.
- (3) Generate real-time data on the master server and analyze the data on the slave server, thereby improving the performance of the master server.
10.4. Replication types supported by MySQL and their advantages and disadvantages:
binlog log files have two formats, one is Statement-Based (Statement-based replication), and the other is Row-Based (row-based replication). The default format is Statement-Based. If you want to change the format, use the -binlog-format option when starting the service. The specific command is as follows:
mysqld_safe –user=msyql –binlog-format=format&
(1) Statement-Based Replication: A SQL statement executed on the master server and the same statement executed on the slave server. The efficiency is relatively high. Once it is discovered that exact copying is not possible, row-based copying will be automatically selected.
Advantages:
- ① Because of the recorded SQL statements, it takes up less storage space. The binlog log contains events describing database operations, but these events only include operations that change the database, such as insert, update, create, delete and other operations. On the contrary, similar operations such as select and desc will not be recorded.
- ② The binlog log file records all statements that change the database, so this file can be used as the basis for auditing the database.
shortcoming:
- ① Not safe, not all statements that change data will be recorded. Non-deterministic behavior will not be logged. For example: for delete or update statements, if limit is used but there is no order by, this is a non-deterministic statement and will not be recorded.
- ② For update, insert...select statements without index conditions, more data must be locked, which reduces the performance of the database.
(2) Row-Based Copy: Copy the changed content instead of executing the command on the slave server. Supported starting from mysql5.0;
Advantages:
- ① All changes will be copied, which is the safest way to copy;
- ② For update, insert...select Wait for statements to lock fewer rows;
Disadvantages:
- ① You cannot check what statements were executed through the binlog log file, and you have no way of knowing what was received on the slave server. statement, we can only see what data has changed.
- ② Because it records data, the storage space occupied by binlog log files is larger than that of Statement-based.
- ③ Operations with large amounts of data will take longer.
(3) Mixed type replication: Statement-based replication is adopted by default. Once it is found that statement-based replication cannot be accurate, row-based replication will be adopted.
For more detailed information about master-slave replication, please read this article: https://blog.csdn.net/a745233700/article/details/85256818
11.1. Implementation principle:
Read and write separation solves the problem that the write operation of the database affects the efficiency of the query. Suitable for scenarios where reading is much greater than writing. The basis for realizing read-write separation is master-slave replication. The master database uses master-slave replication to synchronize its own data changes to the slave database cluster. Then the master database is responsible for processing write operations (of course, it can also perform read operations), and the slave database is responsible for processing reads. operation, write operations cannot be performed. And according to the pressure situation, multiple slave databases can be deployed to increase the speed of read operations, reduce the pressure on the main database, and improve the overall performance of the system.
11.2 Reasons why read-write separation improves performance:
- (1) Add physical servers and load sharing;
- (2) The master and slave are only responsible for their own writing and reading, which greatly alleviates the contention of X lock and S lock;
- (3) The slave library can be configured with the MyISAM engine to improve query performance and save money. System overhead;
- (4) Another major function of master-slave replication is to increase redundancy and improve availability. When one database server goes down, it can be restored as quickly as possible by adjusting another slave database. Serve.
11.3. Mysql read-write split implementation:
- (1) Internal implementation based on program code: In the code, routing classification is performed based on select and insert. The advantage is that the performance is better, because the program is implemented in the code and does not require additional hardware expenses. The disadvantage is that it requires developers to implement it, and operation and maintenance personnel have no way to start.
- (2) Implementation based on the intermediate proxy layer: The proxy is generally between the application server and the database server. After receiving the request from the application server, the proxy database server forwards it to the back-end database based on judgment. The following representative examples are agent layer.
Read Write separation solves the pressure of database read and write operations, but does not disperse the storage pressure of the database. Using sub-databases and sub-tables can solve the storage bottleneck of the database and improve the query efficiency of the database.
12.1. Vertical split:
(1) Vertical table splitting: Divide a table into multiple tables according to fields, and each table stores a part of it field. Generally, commonly used fields will be placed in one table, and less commonly used fields will be placed in another table.
Advantages:
(1) Avoid IO competition and reduce the probability of locking the table. Because large fields are less efficient, first, large fields take up more space, and the number of rows stored in a single page is reduced, which will increase IO operations; second, the amount of data is large, and the reading time is long.
(2) It can better improve the query efficiency of popular data.
(2) Vertical database splitting: Split the tables into different databases according to different business modules, which is suitable for very low coupling between businesses and business logic. Clear system.
Advantages:
- Reduce the coupling in the business and facilitate hierarchical management of different businesses
- Can increase the number of IO and database connections and solve the problem of single machine Bottleneck problem of hardware storage resources
(3) Disadvantages of vertical splitting (sub-database, sub-table):
- Redundant primary keys , need to manage redundant columns
- The processing of transactions becomes complicated
- There is still the problem of excessive data volume in a single table
12.2 , horizontal split:
(1) Horizontal table splitting: In the same database, split the data of the same table into multiple tables according to certain rules.
Advantages:
- Solve the problem of excessive data volume in a single table
- Avoid IO competition and reduce the probability of table lock
(2) Horizontal database splitting: Split the data of the same table into different databases according to certain rules, and different databases can be placed on different servers.
Advantages:
- Solve the bottleneck problem of large data volume in a single database
- Reduce IO conflicts, reduce lock competition, and problems with a certain database Does not affect other databases and improves the stability and availability of the system
(3) Disadvantages of horizontal splitting (sub-tables, sub-databases):
- The consistency of sharded transactions is difficult to solve
- Cross-node JOIN performance is poor and the logic will become complicated
- Data expansion is difficult and difficult to maintain
12.3. Solutions to problems existing in sub-databases and sub-tables:
(1) Transaction problems:
① Solution 1: Use distributed transactions :
- Advantages: Managed by database, simple and effective.
- Disadvantages: The performance cost is high, especially as there are more and more shards.
② Option 2: Program and database jointly control the implementation. The principle is to decompose a distributed transaction across multiple databases into multiple small transactions that only exist on a single database and hand them over to the application Program to overall control each small transaction.
- Advantages: Advantages in performance;
- Disadvantages: Flexible control of transactions is required in the application. If you use spring's transaction management, you will face certain difficulties in making changes.
(2) Cross-node Join problem:
The common way to solve this problem is to query it in two times: in the first The query results focus on finding the IDs of the associated data, and initiate a second request based on these IDs to obtain the associated data.
(3) Cross-node count, order by, group by, paging and aggregation function issues:
Because such problems require calculation based on the entire data collection. Most agents will not automatically handle the merging work. The solution is similar to solving the cross-node join problem. The results are obtained on each node and then merged on the application side. Different from join, the query of each node can be executed in parallel, so the speed is much faster than a single large table. But if the result set is large, the consumption of application memory is a problem.
12.4. After the database is divided into tables, how to deal with the ID key?
After the database is divided into tables, the ID of each table cannot start from 1, so a global ID is needed. There are mainly the following methods to set the global ID:
(1) UUID:
- Advantages: Locally generated ID, no remote call required, globally unique and not repeated.
- Disadvantages: It takes up a lot of space and is not suitable as an index.
(2) Database auto-increment ID: Use the database auto-increment ID after dividing the database into tables. A library specifically used to generate primary keys is required. Each time the service receives a request, it first requests this Insert a piece of meaningless data into the database, obtain an ID that is automatically incremented by the database, and use this ID to write data in separate databases and tables.
- Advantages: Simple and easy to implement.
- Disadvantages: There is a bottleneck under high concurrency.
(3) Redis generates ID:
- Advantages: Does not rely on the database, and has better performance.
- Disadvantages: The introduction of new components will increase the complexity of the system
(4) Twitter’s snowflake algorithm: It is a 64-bit long ID, 1 bit of which is not used , 41 bits are used as milliseconds, 10 bits are used as the working machine ID, and 12 bits are used as the serial number.
- 1bit: The first bit defaults to 0, because if the first bit in binary is 1, it is a negative number, but the ID cannot be negative.
- 41bit: Represents time Stamp, unit is millisecond.
- 10bit: Record the working machine ID, of which 5 bits represent the computer room ID and 5 bits represent the machine ID.
- 12bit: used to record different IDs generated within the same millisecond.
(5) Meituan’s Leaf distributed ID generation system, Meituan-Dianping’s distributed ID generation system:
Partitioning is to store table data in different areas according to specific rules, that is, to divide the table data file into multiple small blocks. When querying data, as long as you know that the data is stored in which areas, and then query directly in the corresponding area. There is no need to query all table data, which improves query performance. At the same time, if the table data is particularly large and cannot fit on one disk, we can also allocate the data to different disks to solve the storage bottleneck problem. Using multiple disks can also improve the IO efficiency of the disk and improve the performance of the database. . When using a partitioned table, you need to note that the partition field must be placed in the primary key or unique index, and the maximum number of partitions for each table is 1024; common partition types are: Range partition, List partition, Hash partition, Key partition,
- (1) Range partitioning: partition according to the continuous interval range
- (2) List partitioning: select the partition according to the value in the given set.
- (3) Hash partitioning: Partitioning based on the return value of a user-defined expression that is calculated using the column values of the rows to be inserted into the table. This function can contain any valid expression in MySQL that produces a non-negative integer value.
- (4) Key partitioning: Similar to HASH partitioning, the difference is that Key partitioning only supports calculation of one or more columns, and the hash function of key partitioning is provided by the MySQL server.
(1) Advantages of table partitioning:
① Scalability:
- will Partitioning on different disks can solve the single disk capacity bottleneck problem, store more data, and also solve the single disk IO bottleneck problem.
② Improve database performance:
- Reduce the amount of data that needs to be traversed during database retrieval. When querying, you only need to query the partition corresponding to the data.
- Avoid the mutually exclusive access restrictions of Innodb's single index
- For aggregate functions, such as sum() and count(), parallel processing can be performed in each partition, and in the end only all partitions need to be counted. The obtained results
③ Convenient operation and maintenance management of data:
- Convenient management. For data that has lost its meaning of preservation, quick deletion can be achieved by deleting the corresponding partitions. effect. For example, to delete historical data at a certain time, directly execute truncate, or directly drop the entire partition, which is more efficient than deteleting;
- In some scenarios, the backup of a single partition table will be more efficient to restore. .
(1) Auto-increment ID:
Benefits of using auto-increment ID:
- The field length will be much smaller than UUID.
- The database is automatically numbered and stored in order, which is convenient for retrieval
- No need to worry about duplication of primary keys
Disadvantages of using self-increasing IDs:
- Because it is self-increasing, in certain business scenarios, it is easy for others to check the business volume.
- It will be very troublesome when data migration or table merging occurs
- In high concurrency scenarios, competition for self-increasing locks will reduce the throughput of the database
(2) UUID: Universal unique identification code. UUID is calculated and generated based on the current time, counter, hardware identification and other data.
Advantages of using UUID:
- Unique identification, no need to consider duplication issues, global uniqueness can be achieved when data is split and merged.
- can be generated at the application layer to improve the throughput of the database.
- No need to worry about business volume leakage.
Disadvantages of using UUID:
- Because UUID is randomly generated, random IO will occur, affecting the insertion speed, and causing low hard disk usage.
- UUID takes up a lot of space, and the more indexes are created, the greater the impact.
- Comparing sizes between UUIDs is much slower than self-increasing IDs, which affects query speed.
Under normal circumstances, MySQL recommends using auto-increment ID, because in MySQL's InnoDB storage engine, the primary key index is a clustered index, and the leaf node of the B-tree of the primary key index The primary key value and data are stored in order. If the primary key index is an auto-incrementing ID, it only needs to be arranged in order. If it is a UUID, the ID is randomly generated, which will cause a large amount of data movement during data insertion, resulting in A large amount of memory fragmentation causes a decrease in insertion performance.
A view is a table derived from one or more tables (or views), and its content is defined by a query . A view is a virtual table. Only the definition of the view is stored in the database, and the data corresponding to the view is not stored. When operating the data of the view, the system operates the corresponding basic table according to the definition of the view. It can be said that a view is a table built on a basic table. Its structure and content come from the basic table and exist based on the existence of the basic table. A view can correspond to one basic table or multiple basic tables. Views are abstractions of basic tables and new relationships established in a logical sense.
(1) Advantages of views:
- Simplifies operations and defines frequently used data as views
- Security, users can only query and modifying the data that can be seen
- Logical independence, shielding the impact of the structure of the real table
(2) Disadvantages of the view:
- Poor performance, the database must convert the query of the view into a query of the basic table. If the view is defined by a complex multi-table query, then even a simple query of the view must be processed by the database. It becomes a complex combination that takes time.
SQL statements need to be compiled first and then executed, and The stored procedure is a set of procedures to complete a specific The functional SQL statement set is compiled and stored in the database. The user calls it by specifying the name of the stored procedure and giving parameters.
Complex logic for operating the database can also be implemented using programs, so why do we need stored procedures? The main reason is that the efficiency of using a program to call the API is relatively slow. The application needs to hand over the SQL statement to the MYSQL engine through the engine for execution. It is better to directly let MySQL take charge of the work it is most proficient in and capable of completing.
Advantages of stored procedures:
- (1) Standard component programming: After a stored procedure is created, it can be called multiple times in the program without having to rewrite the procedure. SQL statements for stored procedures. And the DBA can modify the stored procedure at any time without any impact on the application source code.
- (2) Faster execution speed: If an operation contains a large amount of Transaction-SQL code or is executed multiple times, the stored procedure will be executed much faster than batch processing. Because stored procedures are precompiled, when a stored procedure is queried for the first time, the optimizer analyzes and optimizes it and gives an execution plan that is ultimately stored in the system table. The batch Transaction-SQL statement must be compiled and optimized every time it is run, and the speed is relatively slower.
- (3) Enhance the function and flexibility of SQL language: Stored procedures can be written with control statements, have strong flexibility, and can complete complex judgments and complex operations.
- (4) Reduce network traffic: For operations on the same database object (such as query, modification), if the Transaction-SQL statements involved in this operation are organized into stored procedures, then when the operations are performed on the client computer When the stored procedure is called, only the call statement is transmitted over the network, thus greatly reducing network traffic and reducing network load.
- (5) Make full use of it as a security mechanism: By restricting the permissions to execute a certain stored process, it is possible to limit the access permissions of the corresponding data and avoid unauthorized users from accessing the data. Access ensures data security.
The trigger is a database object related to the table. When the trigger appears on the table where the trigger is located, When an event is specified and the defined conditions are met, the set of statements defined in the trigger will be executed. The characteristics of triggers can be applied on the database side to ensure data integrity. A trigger is a special stored procedure. The difference is that the stored procedure needs to be called using call, while the trigger does not need to use call or manual calling. It triggers execution when inserting, deleting or modifying data in a specific table. It has more sophisticated and complex data control capabilities than the standard functions of the database itself.
The cursor is the identification of swimming and can act as a pointer. The cursor can be used to traverse the query database and return All records in the result set, but only one record can be extracted at a time, that is, only one row of data can be pointed to and retrieved at a time in order to perform corresponding operations. When you don't use a cursor, it's equivalent to someone giving you everything at once. After using a cursor, it's equivalent to someone giving it to you one by one. At this time, you can take a look at this thing first. Whether it's good or not, you make your own choice.
Recommended learning: mysql video tutorial
The above is the detailed content of Summarize and organize common interview questions about MySQL database. For more information, please follow other related articles on the PHP Chinese website!