Skip to content

🐛S3: when copying files there is no callback if the multipart threshold is not reached #6305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Sep 4, 2024

What do these changes do?

#6272 did not completely fix the issue of the progress while duplicating studies or instantiating templates:

  • we use boto3 which officially wraps the AWS API for python,
  • the copy() function of boto3 auto-switches between copy_object and multipart copying based on a threshold that was overriden by us at a value of 5GB instead of the default value of 8MB,
  • this was done as copy_object seemed much faster and seemed to not download/upload data contrary to the multipart counterpart,
  • the problem here is that copy_object does not provide any kind of feedback on the copy progress,
  • the copy callback for progress would only be called with files larger than 5GB, which is not ideal.

This PR therefore adds:

  1. explicit call on the progress callback after the copy to ensure that at least the final feedback comes in,
  • in the case of folder copying, the size of each files is known already,
  • in the case of single file copying, an additional call to S3 API is done to get the size of the object.
  1. reduced the multipart threshold to 100MB for now (@pcrespov I would need here your measurements to see if this is reasonable - if 5GB is copied in 2 seconds or a reasonable time, then we can revert that part)

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg self-assigned this Sep 4, 2024
@sanderegg sanderegg added this to the Eisbock milestone Sep 4, 2024
@sanderegg sanderegg added the a:storage issue related to storage service label Sep 4, 2024
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.8%. Comparing base (cafbf96) to head (24a75ab).
Report is 505 commits behind head on master.

Files with missing lines Patch % Lines
packages/aws-library/src/aws_library/s3/_client.py 80.0% 0 Missing and 1 partial ⚠️
...es/storage/src/simcore_service_storage/s3_utils.py 75.0% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6305      +/-   ##
=========================================
+ Coverage    84.5%   88.8%    +4.2%     
=========================================
  Files          10    1056    +1046     
  Lines         214   47103   +46889     
  Branches       25     408     +383     
=========================================
+ Hits          181   41843   +41662     
- Misses         23    5189    +5166     
- Partials       10      71      +61     
Flag Coverage Δ
integrationtests 64.6% <ø> (?)
unittests 86.2% <80.0%> (+1.6%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...kages/aws-library/src/aws_library/s3/_constants.py 100.0% <100.0%> (ø)
packages/aws-library/src/aws_library/s3/_client.py 95.3% <80.0%> (ø)
...es/storage/src/simcore_service_storage/s3_utils.py 86.1% <75.0%> (ø)

... and 1050 files with indirect coverage changes

Copy link

sonarqubecloud bot commented Sep 4, 2024

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🥇!

@sanderegg sanderegg merged commit 08ec595 into ITISFoundation:master Sep 5, 2024
57 checks passed
@sanderegg sanderegg deleted the bugfix/duplicating-templates2 branch September 5, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:storage issue related to storage service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants