Home > Article > Backend Development > Python downloads large files, which method is faster?
Usually, we use the requests library to download. This library is so convenient to use.
Use the following streaming code, Python memory usage will not increase regardless of the size of the downloaded file:
def download_file(url): local_filename = url.split('/')[-1] # 注意传入参数 stream=True with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): f.write(chunk) return local_filename
If you have a need for chunk encoding , then the chunk_size parameter should not be passed in, and there should be an if judgment.
def download_file(url): local_filename = url.split('/')[-1] # 注意传入参数 stream=True with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'w') as f: for chunk in r.iter_content(): if chunk: f.write(chunk.decode("utf-8")) return local_filename
iter_content[1] The function itself can also be decoded, just pass in the parameter decode_unicode = True. In addition, search the top Python background of the official account and reply "Advanced" to get a surprise gift package.
Please note that the number of bytes returned using iter_content is not exactly chunk_size, it is a random number that is usually larger and is expected to vary on each iteration.
Use Response.raw[2] and shutil.copyfileobj[3]
import requests import shutil def download_file(url): local_filename = url.split('/')[-1] with requests.get(url, stream=True) as r: with open(local_filename, 'wb') as f: shutil.copyfileobj(r.raw, f) return local_filename
This streams the file to disk without using too much memory, and the code is simpler.
Note: According to the documentation, Response.raw will not decode, so you can manually replace the r.raw.read method if needed
response.raw.read = functools.partial(response.raw.read, decode_content=True)
Method two is faster. If method one is 2-3 MB/s, method two can reach nearly 40 MB/s.
[1]iter_content: https://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content
[2]Response.raw: https://requests.readthedocs.io/en/latest/api/#requests.Response.raw
[3]shutil.copyfileobj: https://docs.python.org/3/library/shutil.html#shutil.copyfileobj
The above is the detailed content of Python downloads large files, which method is faster?. For more information, please follow other related articles on the PHP Chinese website!