search

Home  >  Q&A  >  body text

Is there any benefit to establishing multiple database connections for SQL inserts?

I am writing a project related to massive data acquisition. Currently I'm using .NET Framework 4.8 and the Mysql package to initiate connections and insert data into the database server.

I will be inserting approximately 400,000 rows/second. I'm worried that the SQL connection might become a bottleneck for my program. I want to know if I use sql to create multi-threaded connection and use consumer queue to insert data, will it be faster and worth it (pros and cons)?

In my gut it would be faster, but I'm not sure how much performance it would provide in terms of thread overhead. I'm no SQL expert, so it would be great if someone could explain the pros and cons of opening multiple connections to SQL on multiple threads.

P粉585541766P粉585541766243 days ago336

reply all(1)I'll reply

  • P粉373596828

    P粉3735968282024-03-31 00:42:10

    Rumors, opinions, hearsay, facts, version-related benchmarks, some personal experience, etc...

    Multiple threads can improve throughput, but there are limitations:

    • The throughput upper limit is approximately half of the theoretical limit. (your "percentage") (This is a benchmark based on the multithreading package; I forget the name; that was ten years ago.)
    • Multiple threads will compete with each other on mutexes and other necessary locking mechanisms.
    • Starting around 5.7, 64 threads is the multi-threading limit for MySQL; beyond this value, throughput will stagnate or even decrease. (Source: Many Oracle benchmarks boast that one version is significantly better than the previous one.) (Meanwhile, per-thread latency is through the roof.)
    • If possible, each thread should process data in batches.

    Batch processing:

    • LOAD DATA is the fastest way to INSERT a large number of rows at once from a single thread. However, if you include the cost of writing the file to LOAD, it may make it slower than a bulk insert.
    • BULK INSERT follows. But it's capped at "hundreds" of rows when a certain limit or "diminishing returns" is reached.
    • Batch inserts are 10 times faster than inserting one row per INSERT query. Therefore, it (or LOAD DATA) is worth using for high-speed ingestion. (Source: Many different timed tests.)

    Data Sources:

    • Some data sources must transmit only one row at a time (e.g., sensor data from a vehicle every N seconds). This requires some middle layer to batch process the data.
    • Discussion of collecting data: http://mysql.rjweb.org/doc.php /staging_table

    What happens after loading the data? Of course, this is not a write-only table.

    • Normalization is useful for shrinking the disk footprint; best done in batches. SeeStandardization
    • PARTITIONing Rarely useful, except to eventually clear out old data. SeePartition
    • Huge "fact" tables are difficult to search; consider building summary data when ingesting: Summary table
    • You can even do the above processing and then throw away the original data. It sounds like you might be getting a terabyte of data per day.

    reply
    0
  • Cancelreply