Home  >  Article  >  Backend Development  >  Why is downloading in memory slower than downloading in the file system from aws s3?

Why is downloading in memory slower than downloading in the file system from aws s3?

王林
王林forward
2024-02-09 11:57:29589browse

为什么在内存中下载比从 aws s3 在文件系统中下载慢?

Why is downloading in memory slower than downloading in the file system from AWS S3? When downloading files, we usually choose to download from an AWS S3 bucket to the local file system. However, sometimes we find that using the in-memory download method is slower than downloading directly from S3 to the file system. This is because downloading in memory involves some extra steps and resource consumption. First, in-memory downloads require reading the file contents into memory and then writing them to the file system. This process involves additional memory operations and IO operations, which will cause the download speed to be slower than downloading directly from S3 to the file system. In addition, in-memory downloads may also be affected by memory limitations. When the downloaded files are large, it may cause insufficient memory problems, thereby affecting the download speed. Therefore, when choosing a download method, you need to weigh the pros and cons according to the specific situation and choose the most suitable method for downloading.

Question content

I am using aws gosdk to download from a certain bucket. The following are two implementations for download

  1. Download to file
func (a *awsclient) downloadtofile(ctx context.context, objectkey string) (string, error) {
    params := &awss3.getobjectinput{
        bucket: aws.string(a.bucket),
        key:    aws.string(objectkey),
    }

    downloadpath := "some/valid/path"
    f, err := os.create(downloadpath)
    defer f.close()
    _, err = a.downloader.download(ctx, f, params)
    return downloadpath, err
}
  1. Download to memory
func (a *AwsClient) DownloadToMemory(ctx context.Context, objectKey string) (string, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }

    buffer := manager.NewWriteAtBuffer([]byte{})  
    _, err = a.downloader.Download(ctx, buffer, params)
    return buffer.Bytes(), err
}

For a 100 mb file, it takes 30 seconds to download to memory and only 8 seconds to download to the file system. My expectation is that memory downloads should be much faster. My system (apple m1, ventura, 8gb ram) has enough ram available so this is not an issue. Can anyone help me understand this behavior?

Solution

Downloading large S3 objects into dynamic buffers is very inefficient. The buffer was reallocated multiple times to handle 100M of data and multiple download threads. Memory reallocation requires a lot of CPU time.

Try to allocate 100M at the beginning instead of using null byte slices.

If the object size is unknown, you can use S3.HeadObject to get the object length in real time.

The above is the detailed content of Why is downloading in memory slower than downloading in the file system from aws s3?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete