Skip to content

Nodeports/add file checksum #2011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 74 commits into from
Dec 14, 2020

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Nov 30, 2020

What do these changes do?

This PR will add the S3 ETag in the payload of project/comp_tasks such that:

Nodeports package v2

  • based on pydantic
  • on upload sets the S3 ETag in comp_tasks table
  • this will trigger an automatic /retrieve call on the next service even if the filename does not change
  • sidecar uses the node_ports v2

NOTE: nodeports v1 is deprecated, stop using it!

if a file with the same name is uploaded, the ETag will change, triggering a new download
should allow the ControlCore use case with any dynamic service

NOTE: download is not yet prevented if the ETag is the same. this will come later.

Bonus:

  • timing the download/upload of files through node_ports_v2 using tqdm (available in the logs, but would be nice in the frontend log as well)
  • .5 fix of the frontend progressbar

Related issue number

refering to PR #1999

How to test

Checklist

  • Did you change any service's API? Then make sure to bundle document and upgrade version (make openapi-specs, git commit ... and then make version-*)
  • Unit tests for the changes exist
  • Runs in the swarm
  • Documentation reflects the changes
  • New module? Add your github username to .github/CODEOWNERS

@sanderegg sanderegg added a:webserver issue related to the webserver service a:sidecar issue related with the sidecar worker service a:pipeline-services issues on non-core services in user's pipeline (use osparc-services repo for specifics) a:api framework api, data schemas, labels Nov 30, 2020
@sanderegg sanderegg added this to the Alfred_Büchi milestone Nov 30, 2020
@sanderegg sanderegg self-assigned this Nov 30, 2020
@codecov
Copy link

codecov bot commented Nov 30, 2020

Codecov Report

Merging #2011 (904ca55) into master (c7d97b5) will increase coverage by 0.7%.
The diff coverage is 94.6%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2011     +/-   ##
========================================
+ Coverage    72.6%   73.4%   +0.7%     
========================================
  Files         405     413      +8     
  Lines       14846   15097    +251     
  Branches     1503    1526     +23     
========================================
+ Hits        10785   11084    +299     
+ Misses       3664    3620     -44     
+ Partials      397     393      -4     
Flag Coverage Δ
integrationtests 64.4% <86.0%> (+1.0%) ⬆️
unittests 67.6% <85.6%> (+0.9%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
packages/simcore-sdk/src/simcore_sdk/config/db.py 0.0% <0.0%> (ø)
.../simcore-sdk/src/simcore_sdk/node_data/__init__.py 100.0% <ø> (ø)
...s/simcore-sdk/src/simcore_sdk/node_ports/config.py 100.0% <ø> (ø)
...re-sdk/src/simcore_sdk/node_ports/serialization.py 81.5% <ø> (ø)
...es/sidecar/src/simcore_service_sidecar/executor.py 79.6% <69.5%> (ø)
...core-sdk/src/simcore_sdk/node_ports/filemanager.py 80.2% <77.7%> (+2.5%) ⬆️
...re-sdk/src/simcore_sdk/node_ports_v2/port_utils.py 94.7% <94.7%> (ø)
...ls-library/src/models_library/projects_nodes_io.py 100.0% <100.0%> (ø)
...ages/models-library/src/models_library/services.py 97.2% <100.0%> (ø)
packages/simcore-sdk/src/simcore_sdk/__init__.py 100.0% <100.0%> (ø)
... and 49 more

@sanderegg sanderegg force-pushed the nodeports/add_file_checksum branch 7 times, most recently from ad26f5b to eeb26d0 Compare December 8, 2020 14:05
@sanderegg sanderegg changed the title WIP: Nodeports/add file checksum Nodeports/add file checksum Dec 8, 2020
@sanderegg sanderegg marked this pull request as ready for review December 8, 2020 14:10
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally very nice work in v2!! Added some requests for your consideration.

Regarding your note in the description: Should add some

warnings.warn(
              
                category=DeprecationWarning,
            )

in simcore_skd to avoid

@@ -151,7 +161,7 @@ class ServiceProperty(BaseModel):
description="Place the data associated with the named keys in files",
examples=[{"dir/input1.txt": "key_1", "dir33/input2.txt": "key2"}],
)
default_value: Optional[Union[str, float, bool, int]] = Field(
default_value: Optional[Union[StrictBool, StrictInt, StrictFloat, str]] = Field(
None, alias="defaultValue", examples=["Dog", True]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the stricts you ensure that "3" or "True" -> str but 3 -> int and True -> bool??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so first:
very important is to use the ordering in the Union: go for most restricted values to less restricted (e.g. bool -> int -> float -> str).
Then the strict means that 0 is not False (which could be an issue in some cases but not here yet), also 3 is an int but 3.0 isn't.

await file_pointer.write(chunk)
file_size = int(response.headers.get("content-length", 0)) or None

with tqdm(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! I find it very fancy for consoles but how good is to send log info between machines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I get what you mean?


file_path = data_items_utils.create_file_path(key, file_name)
if value == file_path:
# this can happen in case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case ... what? :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. that was a copy paste from node_ports_v1.. which I will correct as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so I cannot change in the node_ports_v1 cause that is the very file that makes black hook fail with error 123... and I don't know how to make black skip it but by changing the pre-commit-config.yml file.

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my comments before are not so critical as to request changes. They do not actually need re-review.

@sanderegg sanderegg force-pushed the nodeports/add_file_checksum branch from af65f98 to 8743ae8 Compare December 9, 2020 08:30
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job.

I have expressed some concerns regarding network errors (which can happen for different reasons like: timeouts, errors during file upload and download etc...)
Please try to figure out if they are valid and might have an impact and if it is possible to handle them gracefully to avoid future errors or bugs.

chunk = await response.content.read(CHUNK_SIZE)
while chunk:
await file_pointer.write(chunk)
file_size = int(response.headers.get("content-length", 0)) or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically only when content-length==0 because when undefined(missing) is always 0. I would just remove the or None part.

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you!

@sanderegg sanderegg force-pushed the nodeports/add_file_checksum branch from 288798b to cfd5817 Compare December 11, 2020 08:24
@sanderegg sanderegg force-pushed the nodeports/add_file_checksum branch from cfd5817 to 898c9c5 Compare December 14, 2020 14:05
@sanderegg sanderegg merged commit 888cf9e into ITISFoundation:master Dec 14, 2020
@sanderegg sanderegg deleted the nodeports/add_file_checksum branch December 14, 2020 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:api framework api, data schemas, a:pipeline-services issues on non-core services in user's pipeline (use osparc-services repo for specifics) a:sidecar issue related with the sidecar worker service a:webserver issue related to the webserver service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants