Home  >  Article  >  Backend Development  >  How to Optimize Bulk Insert Speed to MS SQL Server Using Pyodbc?

How to Optimize Bulk Insert Speed to MS SQL Server Using Pyodbc?

Barbara Streisand
Barbara StreisandOriginal
2024-11-02 09:12:29969browse

How to Optimize Bulk Insert Speed to MS SQL Server Using Pyodbc?

Speeding Up Bulk Insert to MS SQL Server Using Pyodbc

In this code, the author aims to optimize the insertion of over 1.3 million rows into an MS SQL Server database. Currently, the process takes around 40 minutes to insert 300,000 rows. Based on the code provided, the following approaches are recommended to improve the insertion speed:

Utilizing Bulk Insert T-SQL Command

The T-SQL BULK INSERT command is specifically designed for efficient bulk data loading. However, it requires the source file to be located on the same machine as the SQL Server instance or in a network location accessible through SMB/CIFS.

Leveraging Pyodbc's fast_executemany Feature

Pyodbc 4.0.19 introduced the fast_executemany feature in its Cursor class. When enabled, this feature optimizes the execution of executemany queries, which involve inserting multiple rows of data.

The following code demonstrates how to use fast_executemany:

<code class="python">import pyodbc
import time

conn_str = 'connection string'

cnxn = pyodbc.connect(conn_str, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("TRUNCATE TABLE fast_executemany_test")

sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)"
params = [(f'txt{i:06d}',) for i in range(1000)]
t0 = time.perf_counter()
crsr.executemany(sql, params)
print(f'{time.perf_counter() - t0:.1f} seconds')

crsr.fast_executemany = True
t0 = time.perf_counter()
crsr.executemany(sql, params)
print(f'{time.perf_counter() - t0:.1f} seconds')</code>

In the code above, enabling fast_executemany significantly reduces the execution time.

Optimizing Iteration Over Rows

Instead of iterating over the rows one by one, consider using a list or NumPy array to store the data and then insert the entire collection in a single executemany call. This approach eliminates the overhead of repeated cursor execution.

By implementing these optimizations, it is possible to drastically enhance the performance of bulk insert operations in MS SQL Server using pyodbc.

The above is the detailed content of How to Optimize Bulk Insert Speed to MS SQL Server Using Pyodbc?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn