Home > Article > Backend Development > Asynchronous Coroutine Development Practice: Optimizing the Speed of Uploading and Downloading Large Files

Asynchronous Coroutine Development Practice: Optimizing the Speed of Uploading and Downloading Large Files

PHPzOriginal: 2023-12-17 12:50:28791browse

With the development and popularization of the Internet, file transmission has become the norm. But when the transferred files become larger and larger, traditional file uploading and downloading methods will encounter many difficulties. In order to optimize the transmission speed of large files and improve user experience, we can implement it through asynchronous coroutines. This article will share how to use asynchronous coroutine technology to optimize the upload and download speed of large files, and provide specific code examples.

1. Introduction to asynchronous coroutine technology

Asynchronous coroutine is essentially a programming model. Its characteristic is that when blocking occurs, it can immediately release control of the current thread, hand over control to other tasks to continue execution, and wait until the blocking is over before returning to execution, thereby realizing switching between multiple tasks to achieve better results. Efficient processing effect.

Common asynchronous coroutine technologies include asyncio in Python, Callback and Promise in Node.js, etc. Different languages and technologies may have different implementation methods, but essentially they are all designed to better utilize computer resources to improve concurrency and processing efficiency.

2. Optimize the speed of large file uploads

Use chunked upload

When uploading large files, transfer the entire file to the server at one time This will inevitably lead to network congestion and slow transmission speeds. To avoid this problem, large files can be uploaded into multiple chunks. Each chunk is an independent data packet and can be uploaded in parallel to speed up the upload.

Using asynchronous coroutine technology, you can easily implement block uploads and transmit multiple blocks of data in parallel to achieve more efficient upload operations. The following is the specific code implementation.

import aiohttp
import asyncio

async def upload_chunk(session, url, file, offset, size):
    headers = {'Content-Length': str(size), 'Content-Range': f'bytes {offset}-{offset+size-1}/{file_size}'}
    data = file.read(size)
    async with session.put(url, headers=headers, data=data) as resp:
        return await resp.json()

async def upload_file_with_chunks(session, url, file):
    file_size = os.path.getsize(file.name)
    chunk_size = 1024 * 1024 * 5 #每块数据的大小为5MB
    offset = 0
    tasks = []
    while offset < file_size:
        size = chunk_size if offset+chunk_size < file_size else file_size-offset
        tasks.append(upload_chunk(session, url, file, offset, size))
        offset += size
    return await asyncio.gather(*tasks)

async def main():
    async with aiohttp.ClientSession() as session:
        url = 'http://example.com/upload'
        file = open('large_file.mp4', 'rb')
        result = await upload_file_with_chunks(session, url, file)
        print(result)

asyncio.run(main())

In this code, we divide the entire file into data blocks with a size of 5MB, and then use the asyncio.gather() method to concurrently execute the tasks of uploading each data block. to speed up uploads. The idea of chunked uploading also applies to file downloading. Please see the next section for details.

Multi-threaded upload

In addition to using multi-threaded upload, you can also use multi-threading to upload large files. Using multi-threading can make fuller use of your computer's multi-core resources, thereby speeding up file uploads. The following is the specific code implementation.

import threading
import requests

class MultiPartUpload(object):
    def __init__(self, url, file_path, num_thread=4):
        self.url = url
        self.file_path = file_path
        self.num_thread = num_thread
        self.file_size = os.path.getsize(self.file_path)
        self.chunk_size = self.file_size//num_thread
        self.threads = []
        self.lock = threading.Lock()

    def upload(self, i):
        start = i * self.chunk_size
        end = start + self.chunk_size - 1
        headers = {"Content-Range": "bytes %s-%s/%s" % (start, end, self.file_size),
                   "Content-Length": str(self.chunk_size)}
        data = open(self.file_path, 'rb')
        data.seek(start)
        resp = requests.put(self.url, headers=headers, data=data.read(self.chunk_size))
        self.lock.acquire()
        print("Part %d status: %s" % (i, resp.status_code))
        self.lock.release()

    def run(self):
        for i in range(self.num_thread):
            t = threading.Thread(target=self.upload, args=(i,))
            self.threads.append(t)
        for t in self.threads:
            t.start()

        for t in self.threads:
            t.join()

if __name__ == '__main__':
    url = 'http://example.com/upload'
    file = 'large_file.mp4'
    uploader = MultiPartUpload(url, file)
    uploader.run()

In this code, we use the threading module in the Python standard library to implement multi-threaded upload. Divide the entire file into multiple data blocks, and each thread is responsible for uploading one of the blocks, thereby achieving concurrent uploads. Use a lock mechanism to protect thread safety during concurrent uploads.

3. Optimize the speed of large file downloads

In addition to uploading, downloading large files is also a very common requirement, and optimization can also be achieved through asynchronous coroutines.

Bulk download

Similar to chunked upload, chunked download divides the entire file into several chunks, each chunk is downloaded independently, and multiple chunks of data are transmitted in parallel. This speeds up downloads. The specific code implementation is as follows:

import aiohttp
import asyncio
import os

async def download_chunk(session, url, file, offset, size):
    headers = {'Range': f'bytes={offset}-{offset+size-1}'}
    async with session.get(url, headers=headers) as resp:
        data = await resp.read()
        file.seek(offset)
        file.write(data)
        return len(data)

async def download_file_with_chunks(session, url, file):
    async with session.head(url) as resp:
        file_size = int(resp.headers.get('Content-Length'))
        chunk_size = 1024 * 1024 * 5 #每块数据的大小为5MB
        offset = 0
        tasks = []
        while offset < file_size:
            size = chunk_size if offset+chunk_size < file_size else file_size-offset
            tasks.append(download_chunk(session, url, file, offset, size))
            offset += size
        return await asyncio.gather(*tasks)

async def main():
    async with aiohttp.ClientSession() as session:
        url = 'http://example.com/download/large_file.mp4'
        file = open('large_file.mp4', 'wb+')
        await download_file_with_chunks(session, url, file)

asyncio.run(main())

In this code, we use the aiohttp library to perform parallel downloads of asynchronous coroutines. Similarly, divide the entire file into 5MB data blocks, and then use the asyncio.gather() method to execute the task of downloading each data block concurrently to speed up file downloading.

Multi-threaded download

In addition to downloading in chunks, you can also use multi-threaded downloading to download large files. The specific code implementation is as follows:

import threading
import requests

class MultiPartDownload(object):
    def __init__(self, url, file_path, num_thread=4):
        self.url = url
        self.file_path = file_path
        self.num_thread = num_thread
        self.file_size = requests.get(self.url, stream=True).headers.get('Content-Length')
        self.chunk_size = int(self.file_size) // self.num_thread
        self.threads = []
        self.lock = threading.Lock()

    def download(self, i):
        start = i * self.chunk_size
        end = start + self.chunk_size - 1 if i != self.num_thread - 1 else ''
        headers = {"Range": "bytes=%s-%s" % (start, end)}
        data = requests.get(self.url, headers=headers, stream=True)
        with open(self.file_path, 'rb+') as f:
            f.seek(start)
            f.write(data.content)
        self.lock.acquire()
        print("Part %d Downloaded." % i)
        self.lock.release()

    def run(self):
        for i in range(self.num_thread):
            t = threading.Thread(target=self.download, args=(i,))
            self.threads.append(t)
        for t in self.threads:
            t.start()

        for t in self.threads:
            t.join()

if __name__ == '__main__':
    url = 'http://example.com/download/large_file.mp4'
    file = 'large_file.mp4'
    downloader = MultiPartDownload(url, file)
    downloader.run()

In this code, we also use the threading module in the Python standard library to implement multi-threaded downloading. The entire file is divided into multiple data blocks, and each thread is responsible for downloading one of the blocks, thereby achieving concurrent downloading. The lock mechanism is also used to protect thread safety during concurrent downloads.

4. Summary

This article introduces how to use asynchronous coroutine technology to optimize the upload and download speed of large files. By blocking and parallel processing in upload and download operations, the efficiency of file transfer can be quickly improved. Whether it is in asynchronous coroutines, multi-threading, distributed systems and other fields, it has a wide range of applications. Hope this article helps you!

The above is the detailed content of Asynchronous Coroutine Development Practice: Optimizing the Speed of Uploading and Downloading Large Files. For more information, please follow other related articles on the PHP Chinese website!

Python 分布式线程多线程并发 JS promise 异步

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Starting from scratch: Learn how to create stock candlestick charts using PHP and JSNext article：Starting from scratch: Learn how to create stock candlestick charts using PHP and JS

See more

Asynchronous Coroutine Development Practice: Optimizing the Speed ​​of Uploading and Downloading Large Files

Related articles

Asynchronous Coroutine Development Practice: Optimizing the Speed of Uploading and Downloading Large Files