Home >Database >Mysql Tutorial >An article to help you understand the underlying principles of MYSQL
The execution of update starts from Client=> ··· => Execution engine
The process is the same. You must first find this data and then update it. To understand the UPDATE
process, let’s first take a look at Innodb’s architectural model.
Last MYSQL official InnoDB architecture diagram:
Connector (JDBC , ODBC, etc.) =>
[MYSQL Internal
[Connection Pool] (授权、线程复用、连接限制、内存检测等) => [SQL Interface] (DML、DDL、Views等) [Parser] (Query Translation、Object privilege) [Optimizer] (Access Paths、 统计分析) [Caches & Buffers] => [Pluggable Storage Engines]复制代码
]
=> [File]
There is a key point here. When we query the data, we will first take the page
we are currently querying and go to the buffer pool
to query whether the current page
is in Buffer pool
in. If it is, get it directly.
And if it is an update operation
, the value in Buffer
will be modified directly. At this time, the data in buffer pool
is inconsistent with the data actually stored in our disk, and is called
dirty page. Every once in a while, the Innodb storage engine will flush
dirty page data to the disk. Generally speaking, when updating a piece of data, we need to read the data into the
buffer for modification, and then write it back to the disk to complete a
disk IO operation.
update, Mysql has been optimized in memory. You can see that there is an area in the buffer pool
of the architecture diagram called:
change buffer. As the name suggests,
is used to create a buffer for the changed data. When updating data without a unique index
, the modified data is directly placed in the change buffer
. Then the update is completed through the merge
operation, thereby reducing the IO operation for that
disk drop.
When the data without a unique index is updated
can it be placed directly How about entering change buffer
? If it is a field with unique constraints
, after we update the data, the updated data may be duplicated with the existing data, so we can only read all the data from the disk and compare
To determine uniqueness.
So when our data is change buffer
in by increasing
innodb_change_buffer_max_size The proportion of buffer pool
, the default is 25 (ie: 25%)
, use WAL method (write Ahead Logging, record the log before writing) This way, when the database crashes, directly from
to ensure data correctnessredo log is stored in two files by default
ib_logfile1
, both files areFixed size
. Why do you need fixed size? This is caused by the
feature of redo log
, which must be a continuous storage space 2. Random reading and writing and Sequential reading and writing
Generally our data is scattered on the disk:
The reading and writing sequence of the mechanical hard disk is:
Locate the trackIn fact, regardless of mechanical or solid-state, when we go to store, They all deal with the disk through File system
, and there are two ways of dealing with them. Random read and write
and Sequential read and write
blocks
(default 1block= 8 sectors = 4K) series of consecutive blocks
, so the reading speed is greatly improvedSee Log Buffer
in buffer pool
, which is used to write The buffer that existed before redo log
Here, there are three specific execution strategies for redo log:
Log Buffer
, only need to write every second Redo logs disk data once and has high performance, but it will cause data consistency problems within 1 second. Applicable to strong real-time performance
, weak consistency
, for example, comments in the comment area
Log Buffer
, and write at the same time Into the disk, the performance is the worst and the consistency is the highest. Applicable to weak real-time
, strong consistency
, such aspayment scenario
Log Buffer
, and write to os buffer
(it will call fsync
every second to flush data to the disk), with good performance and high security. This is moderate real-time
moderate consistency
, such as order type
. We can set the execution policy through innodb_flush_log_at_trx_commit
. The default is 1
Mainly used to speed up queries
Page. When querying, Innodb determines whether the current query can go through the
Hash index by monitoring the index search mechanism. For example, the LIKE operator and the % wildcard character cannot be used.
ibdata1, which contains:
writes the data page, it is not written directly to the file. Instead, it is written to this area first. The advantage of this is that once the operating system, file system or mysql hangs, the data can be obtained directly from this
Buffer.
.ibd to store data and indexes.
, the performance of
ALTER TABLE and
TRUNCATE TABLE can be greatly improved. For example,
ALTER TABLE, compared to a table residing in a shared table space, when modifying the table, a
table copy operation will be performed, which may increase the number of table space occupied
Amount of disk space. Such operations may require as much additional space as the data in the table and the indexes. This space is not released back to the operating system like
file-per-table tablespace.
Drop table
(unless you manage the fragmentation yourself) fsync
one-time flushing of data into the filefile handle
of each table file, to Provides continuous access to filesshared tablespace
, it can store Data from multiple tables
Table space per table
are stored in a file called ibtmp1
. Under normal circumstances, Mysql will create a temporary table space when it starts, and delete the temporary table space when it stops. And it can automatically expand.
atomicity
of modification operations, that is, when an exception occurs in the middle of the modification, it can be rolled back through the Undo log. system table space `` undo table space `` temporary table space
, as shown in the architecture diagram . As mentioned before
origin
here, is returned to the executor modification
modification
into memory, Change Buffer of
Buffer Pool
said Undo
, Redo
also By the way Bin log
.
innodb
engine. The two logs we mentioned earlier are both innodb Engine layer. And Bin log
is in the service layer
. So it can be used by various engines.Bin log
records each DDL DML
statement in the form of events. It is a log in a logical sense. master-slave replication
, Get the
bin log log of the
main server from the
server, and then execute. data recovery
, get the logs of a certain period of time, and execute it again. index
TryGorgeous dividing line
If you want to completely understand what the index in
InnoDB is, you must understand its File storage levels
Pages, Extents, Segments, and Tablespaces
Their relationship is:
extent
size is 1M
, which is 64
16KB
Page
. The page size usually referred to by our file system is 4KB
, containing 8
sectors of 512Byte
. So sometimes, we are asked why the primary key must be ordered. The reason is that if we On an ordered field, create an index and then insert data.
When storing, innodb will store it on page
one by one in order. When one page is full, it will apply for a new page, and then continue to save.
But if our fields are unordered, the storage locations will be on different pages. When our data is stored on a page
that has been full
, it will cause page splits
, thus forming fragments
.
B tree
diagram above, rows of data
are stored on the child nodes, And if the arrangement order of the index
is consistent with the index key value order
, it is clustered index
. The primary key index is a clustered index. Except for the primary key index, all others are auxiliary indexes
auxiliary index
, its leaf nodes Only 's own value
and the value of the primary key index
are stored. This means that if we query all data through the auxiliary index, we will first find the primary key value
in the auxiliary index
, and then go to the primary key index
Inside, relevant data
was found. This process is called Back to the table
rowid
What should I do if there is no primary key index
? clustered index
will be created based on this key. rowid
, and creates a clustered index based on this id
After figuring out what an index is and what its structure is. Let’s take a look at when we need to use indexes. Understanding these can better help us create correct and efficient indexes
If the dispersion is low, no index will be built, that is, the data If there is not much difference between them, there is no need to create an index. (Because of the index creation, most of the data in innodb are the same when querying. If there is no difference between the index and the whole table, I will directly full table query
). For example, the gender field. This wastes a lot of storage space.
Joint field index, such as idx(name, class_name)
select * from stu where class_name = When querying xx and name = lzw
, you can also use the index idx
, because the optimizer optimizes SQL to name = lzw and class_name = xx
select ··· where name = lzw
, you do not need to create a separate name
index, you will go directly to idx
this indexCovering Index
. If all the data
we query this time are included in the index, there is no need to return to the table
to query. For example: select class_name from stu where name =lzw
Index condition pushdown (index_condition_pushdown)
select * from stu where name = lzw and class_name like '%xx'
index condition, push down
, because it is followed by like ' %xx'
query conditions, so here we first go to the idx joint index
based on name
. After querying several pieces of data, we will return to the table
queryFull row data
, and then perform like filtering on the server layer
to find the dataengine layer
, which is equivalent to pushing the filtering operation of the
server layer down to the engine layer
. As shown in the figure: page splits
, the index is stored in order, if the storage page is full, inserting again will cause page splits) functions such as replace, sum, count, etc.
, so there is no need to build additional ones. select count(distinct left(name, 10))/count(*)
to Look at the degree of dispersion and decide to extract the top few). For example, if you use the
Cost Base Optimizer cost-based optimizer, use whichever optimization has the lowest cost.
Prerequisite, in a transaction:
SQL92 standard regulations: (Concurrency decreases from left to right)
These two solutions are used together in Innodb. Here is a brief explanation of the MVCC implementation of RR
. The initial value of the rollback id in the figure should not be 0 but NULL. For convenience, it is written as 0
RC's MVCC implementation creates a version for multiple reads of the same transaction
, while RR creates a version for any one of the same transaction
Through the combination of MVCC
and LBCC
, InnoDB can solve the problem of phantom reading under no locking
conditions. Instead of being like Serializable
, the transaction must be serially
ed without any concurrency
.
Let’s take an in-depth look at how InnoDB lock
is implemented RR
Transaction isolation level
table level
=> (IS, IX)The above four locks
are The most basic types of locks
These three locks are understood as the three algorithm methods implemented for the above four locks
. We will temporarily call them here: High-order locks
The above three are additional extended locks
lock in share mode
after the statement . Exclusive locks will be used by Insert, Update, and Delete
by default. Display using for update
after the statement. to record whether the table is locked) => If there is no such lock, when other transactions want to lock the table, they must go to the entire table Scanning for locks is too inefficient. That's why intention locks exist.
)
, and creates a
clustered index based on this id
locking query on a table for which you have not explicitly created an
index, the database actually does not know what data to check. Tables may be used. So simply
lock the entire table.
auxiliary index
, for example, select * from where name = 'xxx' for update
Finally, return to the table
check Information on the primary key, so at this time in addition to locking auxiliary index
, we also need to lockprimary key index
x, (-∞,1), (1,3), (3,6), (6,9), (9, ∞)
When locking, what is locked is (-∞,1], (1,3], (3,6], (6,9], (9, ∞]. The left-open and right-closed intervals
exclusive locks, and temporary key lock = record lock gap lock
When
select * from xxx where id = 5 for updateWhen
select * from xxx where id = 5 for update
prevents other transactions from being added,
Gap Lock and Record Lock are combined to form
Next- Key lockjointly solves
RR level the phantom read problem when writing data.
When it comes to locks, there is no way to escape but let’s talk about deadlockInnodb_row_lock_current_waits How many locks are currently waiting for
show full processlist Can we query which user
is on which machine host and which port
Which database is connected
What instructions are executed
#Status and time
Deadlock prevention
#If you want to get better query performance, you can start from this
Query execution process , Client connection poolAdd a connection pool to avoid
creating and destroying connections every time
CPU
can actually execute threads
. Because the operating system uses time slicing
technology, it makes us think that one CPU core
executes multiple threads
. CPU
can only execute one thread
during a certain period of time, so no matter how we increase concurrency,
CPU can still only process so much data in this time period.
cannot process so much data, how can it become slower? Because of
time slicing, when multiple threads appear to be
"executing simultaneously", in fact
context switching between them is very time-consuming
operations. At this time,
CPU can Slice time to other
threads to improve processing efficiency and speed
waiting time is very short, we cannot add too many connections
. For example, a
i7 4core 1hard disk machine is 4 * 2 1 = 9
For example: setting the maximum number of threads, etc.
Redis
read-write separation
asynchronous replication characteristic.
to
relay log,
slave will read the latest
Binary Log Position is recorded to
master info, and will be fetched directly from this position next time.
asynchronous master-slave replication is that the update is not timely. When a piece of data is written and immediately read by a user, the data read is still the previous data, which means there is a delay.
To solve the delay problem, it is necessary to introduce
transactions
action, that is, if the master node hangs up and the slave node is elected, data loss can be quickly and automatically avoided.
Lock operationsAffected performance
turns onshow_query_log
, and SQL statements whose execution time exceeds the variable long_query_time
will be recorded.
You can use mysqldumpslow /var/lib/mysql/mysql-slow.log
. There are many plug-ins that can provide more elegant analysis than this, so I won’t go into details here.
After writing any SQL, you should explain
left/right join
leads to low performance left/right join
will directly specify the driver table. In MYSQL, Nest loop join
is used by default for table association (that is, the result set of the driven table
is used as the basic data of the loop, and then the next associated table is filtered through each piece of data in this set data, and finally merge the results to get what we often call temporary table
). driven table
is millions and millions
level, you can imagine how slow the query of this joint table will be. But on the other hand, if small table
is used as driven table
, the query can become very fast with the help of index
of tens of millions of tables .
, then please let the optimizer decide, for example:
select xxx from table1, table2, table3 where ·· ·, the optimizer will use the table with a small number of query record rows as the driving table.
by yourself, then please grab the
Explain weapon. Among the results of
Explain, the first one is
Basic driver table
. We try to sort the
driver table instead of the
temporary table, which is the merged result. Set to sort. That is,
using temporary appears in the execution plan, and optimization needs to be performed.
and
Complex query( Union query, subquery, etc.)
, if the query does not contain a subquery or UNION
, if the query contains
complex query substructure, then you need to use the primary key query
, and include the
substructure in select
or where
Query
, contains subquery
UNION RESULT, from
union Table query subquery
Constant level scan, the fastest way to query the table, system is a special case of const (in the table Only one piece of data)
Unique index scan
Non-unique index scan
Index range scan, such as between, and other range queries
(index full) scans all index trees
Scan the entire table
, no need to access the table or index
key: which index
key_len: the number of bytes occupied by the index used
rows: How many rows were scanned in total
server
layer filtering and then use where
to filter the result set
DISTINCT, sorting, and grouping
server layer
This filtering operation is pushed down to the engine layer
Storage Engine
When you only use temporary data, you can use
memoryWhen
insert, update, query
SummarySQL and index
Related free learning recommendations: mysql video tutorial
The above is the detailed content of An article to help you understand the underlying principles of MYSQL. For more information, please follow other related articles on the PHP Chinese website!