Is there any benefit to establishing multiple database connections for SQL inserts?

Question

I'm writing a project related to massive data acquisition. Currently I'm using .NETFramework4.8 and the Mysql package to initiate connections and insert data to the database server. I will be inserting around 400,000 rows/second. I'm worried that the SQL connection may become the bottleneck of my program. I want to know if I use sql to create multi-threaded connection and use consumer queue to insert data, will it be faster and worth it (pros and cons)? In my gut it would be faster, but I'm not sure how much performance it would provide in terms of thread overhead. I'm not a SQL expert, so if anyone

P粉373596828 · Answer

Rumors, opinions, hearsay, facts, version-related benchmarks, some personal experience, etc...

Multiple threads can improve throughput, but there are limitations:

The throughput upper limit is approximately half of the theoretical limit. (your "percentage") (This is a benchmark based on the multithreading package; I forget the name; that was ten years ago.)
Multiple threads will compete with each other on mutexes and other necessary locking mechanisms.
Starting around 5.7, 64 threads is the multi-threading limit for MySQL; beyond this value, throughput will stagnate or even decrease. (Source: Many Oracle benchmarks boast that one version is significantly better than the previous one.) (Meanwhile, per-thread latency is through the roof.)
If possible, each thread should process data in batches.

Batch processing:

LOAD DATA is the fastest way to INSERT a large number of rows at once from a single thread. However, if you include the cost of writing the file to LOAD, it may make it slower than a bulk insert.
BULK INSERT follows. But it's capped at "hundreds" of rows when a certain limit or "diminishing returns" is reached.
Batch inserts are 10 times faster than inserting one row per INSERT query. Therefore, it (or LOAD DATA) is worth using for high-speed ingestion. (Source: Many different timed tests.)

Data Sources:

Some data sources must transmit only one row at a time (e.g., sensor data from a vehicle every N seconds). This requires some middle layer to batch process the data.
Discussion of collecting data: http://mysql.rjweb.org/doc.php /staging_table

What happens after loading the data? Of course, this is not a write-only table.

Normalization is useful for shrinking the disk footprint; best done in batches. SeeStandardization
PARTITIONing Rarely useful, except to eventually clear out old data. SeePartition
Huge "fact" tables are difficult to search; consider building summary data when ingesting: Summary table
You can even do the above processing and then throw away the original data. It sounds like you might be getting a terabyte of data per day.

Is there any benefit to establishing multiple database connections for SQL inserts?

reply all(1)I'll reply