This article brings you relevant knowledge about mysql. It mainly compiles some frequently asked questions in interviews, including database architecture, indexing and SQL optimization, etc. I hope everyone has to help.
Recommended learning: mysql tutorial
Tell the interviewer about the logical architecture of MySQL. If you have a whiteboard, you can draw the following picture. The picture comes from the Internet.
Mysql logical architecture diagram is mainly divided into three layers:
(1) The first layer is responsible for connection processing, authorization authentication, security, etc.
(2) The second layer is responsible for compiling and optimizing SQL
(3) The third layer is the storage engine.
First check whether the statement has permission
. If there is no permission, an error message will be returned directly. If there is permission, the cache will be queried first (before MySQL8.0 version).
If there is no cache, the analyzer performs lexical analysis
, extracts key elements such as select in the sql statement, and then determines whether the sql statement has grammatical errors, such as whether the keyword Correct and so on.
Finally, the optimizer determines the execution plan and performs permission verification. If there is no permission, it will directly return an error message. If there is permission, it will call the database engine interface
and return to execution. result.
You can answer this question from these dimensions:
(1) Try to use numeric fields
If fields containing only numeric information should not be designed as character type, this will reduce the performance of queries and connections, and increase storage overhead. This is because the engine will compare each character in the string one by one when processing queries and connections, and only one comparison is enough for numeric types.
(2) Use varchar instead of char as much as possible
Variable length fields have small storage space and can save storage space.
(3) When there is a large amount of duplicate data in the index column, the index can be deleted
For example, if there is a column for gender, almost only male, female, and unknown, such an index is invalid.
Try to avoid using != or operator in the where clause
Try to avoid using or in the where clause to connect conditions
Do not appear select in any query *
Avoid null value judgment on fields in where clause
Create indexes for fields that serve as query conditions and order by
Avoid creating too many indexes and use combined indexes
Add the explain keyword before the select statement to return execution plan information.
(1) id column: It is the serial number of the select statement. MySQL divides select queries into simple queries and complex queries.
(2) select_type column: Indicates whether the corresponding row is a simple or complex query.
(3) table column: Indicates which table a row of explain is accessing.
(4) type column: one of the most important columns. Represents the type of association or access type that MySQL determines how to find rows in the table. From best to worst: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL
( 5) possible_keys column: Shows which indexes the query may use to find.
(6) key column: This column shows which index mysql actually uses to optimize access to the table.
(7) key_len column: shows the number of bytes used by mysql in the index. This value can be used to calculate which columns in the index are used.
(8) ref column: This column shows the columns or constants used by the table to look up values in the index of the key column record. Common ones are: const (constant), func, NULL, and field names.
(9) rows column: This column is the number of rows that mysql estimates to read and detect. Note that this is not the number of rows in the result set.
(10) Extra column: Display additional information. For example, there are Using index, Using where, Using temporary, etc.
When we usually write Sql, we must develop the habit of using explain analysis. Statistics of slow queries, operation and maintenance will give us regular statistics
Optimization ideas for slow queries:
Analyze statements to see if unnecessary fields/data are loaded
Analyze the SQL execution sentence, whether the index is hit, etc.
If the SQL is very complex, optimize the SQL structure
If the amount of table data is too large, consider splitting the table
You can press Answers from the following four dimensions:
(1) A table can only have one clustered index, while a table can have multiple non-clustered indexes.
(2) Clustered index, the logical order of the key values in the index determines the physical order of the corresponding rows in the table; non-clustered index, the logical order of the index in the index is different from the physical storage order of the rows on the disk.
(3) The index is described through the data structure of a binary tree. We can understand the clustered index this way: the leaf nodes of the index are the data nodes. The leaf nodes of non-clustered indexes are still index nodes, but have a pointer pointing to the corresponding data block.
(4) Clustered index: physical storage is sorted according to the index; non-clustered index: physical storage is not sorted according to the index;
We can look at this problem from several dimensions, whether the query is fast enough, whether the efficiency is stable, how much data is stored, and the number of disk searches. Why is it not an ordinary binary tree, why is it not a balanced binary tree, why is it not a B-tree, and What about the B-tree?
If the binary tree is specialized into a linked list, it is equivalent to a full table scan. Compared with binary search trees, balanced binary trees have more stable search efficiency and faster overall search speed.
We know that the query efficiency of data in memory is much faster than that in disk. If a data structure like a tree is used as an index, then every time we search for data, we need to read a node from the disk, which is what we call a disk block, but a balanced binary tree only stores one key value and data per node. If it is a B-tree, more node data can be stored, and the height of the tree will also be reduced, so the number of disk reads will be reduced, and the query efficiency will be faster.
B tree does not store data on non-leaf nodes, only key values. However, B tree nodes not only store key values, but also store data. The default size of a page in innodb is 16KB. If no data is stored, more key values will be stored, the order of the corresponding tree (the child node tree of the node) will be larger, and the tree will be shorter and fatter. In this way, the number of disk IO times we need to search for data will be reduced again, and the efficiency of data query will be faster.
B All data in the tree index are stored in leaf nodes, and the data is arranged in order and linked to the linked list. Then B-tree makes range search, sort search, group search and deduplication search extremely simple.
B tree can perform range query, Hash index cannot.
B-tree supports the leftmost principle of joint index, but Hash index does not support it.
B-tree supports order by sorting, but Hash index does not support it.
Hash index is more efficient than B-tree for equivalent queries.
When B-tree uses like for fuzzy query, the words after like (such as starting with %) can play an optimization role, and the Hash index cannot perform fuzzy query at all.
The leftmost prefix principle is leftmost priority. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side.
When we create a combined index, such as (a1, a2, a3), it is equivalent to creating three indexes (a1), (a1, a2) and (a1, a2, a3). This This is the leftmost matching principle.
It is not suitable to add index if the amount of data is small
It is updated more frequently Not suitable for indexing = Fields with low discrimination are not suitable for indexing (such as gender)
(1) Advantages:
The unique index can ensure the uniqueness of each row of data in the database table
Index It can speed up data query and reduce query time
(2) Disadvantages:
It takes time to create and maintain indexes
Indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space.
Based on the data in the table When adding, deleting, or modifying, the index must also be maintained dynamically.
Encountered. My general steps for troubleshooting deadlocks are as follows:
(1) Check the deadlock log show engine innodb status;
(2) Find out the deadlock Sql
(3) Analyze the sql lock situation
( 4) Simulate deadlock cases
(5) Analyze deadlock log
(6) Analyze deadlock results
(1) Pessimistic lock:
The pessimistic lock is single-minded and insecure. Her heart only belongs to the current affairs, and she is always worried that her beloved data may be stolen. Modified by other transactions, so after a transaction owns (obtains) a pessimistic lock, no other transaction can modify the data and can only wait for the lock to be released before executing it.
(2) Optimistic lock:
The "optimism" of optimistic lock is reflected in the fact that it believes that the data will not change too frequently. Therefore, it allows multiple transactions to make changes to the data simultaneously.
Implementation method: Optimistic locking is generally implemented using the version number mechanism or CAS algorithm.
MVCC (Multiversion Concurrency Control), that is, multi-version concurrency control technology.
The implementation of MVCC in MySQL InnoDB is mainly to improve database concurrency performance and use a better way to handle read-write conflicts, so that even if there are read-write conflicts, no locking can be achieved. Non-blocking concurrent reads.
Atomicity: The transaction is executed as a whole and is included in Either all operations on the database are executed, or none are executed.
Consistency: means that the data will not be destroyed before the transaction starts and after the transaction ends. If account A transfers 10 yuan to account B, regardless of success or failure, A and B The total amount remains unchanged.
Isolation: When multiple transactions access concurrently, the transactions are isolated from each other, that is, one transaction does not affect the running effects of other transactions. In short, it means that there is no conflict between affairs.
Persistence: Indicates that after the transaction is completed, the operational changes made by the transaction to the database will be permanently saved in the database.
Read Uncommitted
Read Committed
Repeatable Read
Serializable
The default transaction isolation level of Mysql is Repeatable Read )
Transactions A and B are executed alternately. Transaction A is interfered by transaction B because transaction A reads uncommitted data of transaction B. This is a dirty read.
Within the scope of a transaction, two identical queries read the same record but return different data. This is a non-repeatable read.
Transaction A queries the result set of a range, and another concurrent transaction B inserts/delete data into this range and commits it silently. Then transaction A queries the same range again, and reads twice. The result set is different, which is phantom reading.
Troubleshooting process:
(1) Use the top command to observe and determine whether it is caused by mysqld or other reasons.
(2) If it is caused by mysqld, show processlist, check the session status, and determine whether there is any resource-consuming SQL running.
(3) Find out the SQL with high consumption and see whether the execution plan is accurate, whether the index is missing, and whether the amount of data is too large.
Processing:
(1) Kill these threads (and observe whether the CPU usage decreases)
(2) Make corresponding adjustments (such as adding indexes, changing sql, change memory parameters)
(3) Rerun these SQLs.
Other situations:
It is also possible that each SQL statement does not consume much resources, but suddenly, a large number of sessions are connected, causing the CPU to surge. In this case, you need to contact the application. Let’s analyze why the number of connections surges, and then make corresponding adjustments, such as limiting the number of connections, etc.
Master-slave replication is divided into five steps: (pictures from the Internet)
Steps One: The update events (update, insert, delete) of the main library are written to the binlog
Step 2: Initiate a connection from the slave library and connect to the main library.
Step 3: At this time, the main library creates a binlog dump thread and sends the contents of the binlog to the slave library.
Step 4: After starting from the slave library, create an I/O thread, read the binlog content passed from the main library and write it to the relay log
Step 5: A SQL thread will also be created to read the content from the relay log, execute the read update event starting from the Exec_Master_Log_Pos position, and write the updated content to the slave db
Cause of master-slave synchronization delay
A server opens N links for clients to connect, so there will be large concurrent update operations, but there is only one thread to read the binlog from the server. When a certain SQL is executed on the slave server If it takes a long time or because a certain SQL needs to lock the table, there will be a large backlog of SQL on the master server and it will not be synchronized to the slave server. This leads to master-slave inconsistency, that is, master-slave delay.
Solution to master-slave synchronization delay
The master server is responsible for the update operation and has higher security requirements than the slave server, so Some setting parameters can be modified, such as sync_binlog=1, innodb_flush_log_at_trx_commit = 1 and other settings.
Choose a better hardware device as a slave.
If a slave server is used as a backup without providing queries, its load will be reduced, and the efficiency of executing the SQL in the relay log will naturally be higher.
The purpose of adding slave servers is to disperse the reading pressure and thereby reduce the server load.
Sub-database and table scheme:
Horizontal sub-database: Based on fields and according to certain strategies (hash, range, etc.), Data in one library is split into multiple libraries.
Horizontal table splitting: Split the data in one table into multiple tables based on fields and certain strategies (hash, range, etc.).
Vertical database splitting: Based on tables, different tables are split into different databases according to different business ownerships.
Vertical table splitting: Based on the fields and according to the activity of the fields, the fields in the table are split into different tables (main table and extended table).
Commonly used sharding middleware:
Problems that may be encountered in sub-databases and sub-tables
The above is the detailed content of MySQL interview Q&A collection (summary sharing). For more information, please follow other related articles on the PHP Chinese website!