Inserting into large MySQL table without auto-increment primary key is very slow

Question

I've recently noticed a significant increase in the difference in time required to complete a simple INSERT statement. While these statements take about 11 milliseconds on average, they can sometimes take 10-30 seconds, and I've even noticed them taking more than 5 minutes to execute. MySQL version is 8.0.24, running on WindowsServer2016. As far as I know, the server's resources have never been overloaded. The server has ample CPU overhead available and 32GB of RAM allocated to it. This is the table I'm using: CREATETABLE`saved_segment

P粉022140576 · Answer

I'll go out on a limb with this answer.

Assumption

innodb_buffer_pool_size has a value slightly less than 20MB, and
1K selections per second arrive at random parts of the table, then

Systems have become I/O bound lately, as the "next" block required for the next Select is more and more often not cached in the buffer_pool.

The simple solution is to get more RAM and increase the setting of this tunable. But the table will only grow to the next limit you purchase.

Instead, here are some partial solutions.

If the numbers are not too large, the first two columns may be INT UNSIGNED (4 bytes instead of 8), or even MEDIUMINT UNSIGNED (3 bytes) . Note that ALTER TABLE will lock the table for a long time.
These start and end times look like timestamps with fractional seconds and are always ".000". DATETIME and TIMESTAMP take 5 bytes (instead of 8 bytes).
Your example shows an elapsed time of 0. If (end-start) is typically very small, storing the elapsed time instead of the end time will further shrink the data. (But using an end time can make things confusing).
The sample data you provided looks "continuous". This is about as efficient as auto-increment. Is this the norm? If not, the INSERT may be part of the I/O thrashing.
You suggest adding artificial intelligence as well as secondary indexes, which doubles the work of inserting; so I don't recommend it.

More

Yes, that is the case.

Putting this as INDEX, or better yet, as the beginning of PRIMARY KEY will give you the best help with both of your queries:

(recording_id, index)

reply:

SELECT  TRUE
FROM    saved_segment
WHERE   recording_id = ? AND `index` = ?

If it is used to control some other SQL, consider adding it to the other SQL:

... EXISTS ( SELECT 1
        FROM    saved_segment
        WHERE   recording_id = ? AND `index` = ? ) ...

This query (in either form) requires content you already have

PRIMARY KEY(recording_id, index)

Your other inquiry needs

INDEX(recording_id, start_filetime)

So, add index, or ...

Better...This combination is better for bothSELECT:

PRIMARY KEY(recording_id, start_filetime, index).
INDEX(recording_id, index)

With this combination,

Single row existence check will be performed "using index" because it is "covered".
Another query will find all related rows clustered together on the PK.
(PK has these 3 columns because it needs to be unique. Having them in this order helps your second query. And it's a PK, not just an INDEX, so it doesn't need to be in the index's Bounce between BTree and data of BTree.)
"Cluster" can improve performance by reducing the number of disk blocks required for such queries. This reduces "thrashing" in the buffer_pool, thereby reducing the need to increase RAM.
My index suggestions are mostly orthogonal to my data type suggestions.

Inserting into large MySQL table without auto-increment primary key is very slow

reply all(1)I'll reply