Home >Database >Mysql Tutorial >How Can I Read Large SQL Queries into Pandas DataFrames Without Running Out of Memory?
Avoiding Memory Errors When Importing Large SQL Queries into Pandas DataFrames
Working with massive SQL databases often presents challenges when retrieving large datasets. Attempting to load entire tables exceeding a million rows directly into a Pandas DataFrame can easily lead to memory errors. The following code illustrates this problem:
<code class="language-python">import pandas.io.sql as psql sql = "SELECT TOP 2000000 * FROM MyTable" data = psql.read_frame(sql, cnxn)</code>
This method is prone to failure, resulting in a "MemoryError" if the resulting DataFrame exceeds available RAM.
Pandas, since version 0.15, offers a robust solution: the chunksize
parameter. This allows you to read and process the SQL query in smaller, manageable portions.
Here's how to implement this solution:
<code class="language-python">sql = "SELECT * FROM My_Table" for chunk in pd.read_sql_query(sql , engine, chunksize=5): print(chunk)</code>
By specifying chunksize
, Pandas retrieves data in increments. Each chunk is processed individually, preventing memory overload. The example above prints each chunk; you can adapt this to perform other operations on each chunk as needed.
This technique provides a practical and memory-efficient way to handle large SQL queries, ensuring smooth data processing even with substantial datasets.
The above is the detailed content of How Can I Read Large SQL Queries into Pandas DataFrames Without Running Out of Memory?. For more information, please follow other related articles on the PHP Chinese website!