Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pushChunks #176

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

vaidikcode
Copy link
Contributor

@vaidikcode vaidikcode commented Mar 1, 2025

Signed-off-by: vaidikcode [email protected]

Description

  • Added pushChunks

Testing done

Submitter checklist

  • I have read and understood the CONTRIBUTING guide
  • I have run mvn license:update-file-header, mvn spotless:apply, pre-commit run -a, mvn clean install before opening the PR

Signed-off-by: vaidikcode <[email protected]>
@vaidikcode
Copy link
Contributor Author

vaidikcode commented Mar 1, 2025

To fix the tests, one way is to add the alias method

public Layer pushBlobStream(ContainerRef containerRef, InputStream input, long size) {
    return pushChunks(containerRef, input, size);
}

or update the tests.
@jonesbusy
wdyt?

@jonesbusy
Copy link
Collaborator

Thanks for your PR. I will take a look.

Please update the tests. It's fine to break the existing API for now (Renaming pushBlobStream to pushChunks)

// Phase 1: Initialize upload session
URI uri = URI.create("%s://%s".formatted(getScheme(), containerRef.getBlobsUploadPath()));
OrasHttpClient.ResponseWrapper<String> response =
client.post(uri, new byte[0], Map.of("Content-Length", "0"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use constants from Consts for all headers

throw new OrasException("Failed to initiate blob upload: " + response.statusCode());
}

String location = response.headers().get(Const.LOCATION_HEADER.toLowerCase());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand this part. Why do we need to update the location?

Copy link
Contributor Author

@vaidikcode vaidikcode Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec section on "Pushing a blob in chunks":

Each successful chunk upload MUST have a 202 Accepted response code, and MUST have the following headers:

Location: <location>

Range: 0-<end-of-range>

Each consecutive chunk upload SHOULD use the provided in the response to the previous chunk upload.

The code is correctly implementing this part of the specification. After each chunk upload (PATCH request), the registry may update the location header to point to a different URL for the next chunk

endRange = startRange + bytesRead - 1;

Map<String, String> headers = Map.of(
"Content-Type",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants here

}

// Updating range for next chunk
String rangeHeader = response.headers().get("range");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant


// Phase 3: Complete upload
Map<String, String> headers = Map.of(
"Content-Type", "application/octet-stream",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant

// We Initilase the Message Digest first
MessageDigest digest;
try {
digest = MessageDigest.getInstance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic should not be here

Use

public static boolean isSupported(String prefix)

Or implement the logic in private method

private static final int CHUNK_SIZE = 5 * 1024 * 1024;

/**
* The digest calculation limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 16? Is it from the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not from the specifications. It's just a general practical value. We can change it, though, as per our requirements. What should it be set to?

/**
* The chunk size for uploading blobs
*/
private static final int CHUNK_SIZE = 5 * 1024 * 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5? It it from the spec? If yes would be nice to link it on the javadoc. Or explain this value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as digest calculation limit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be honnored

If the registry has a minimum chunk size, the POST response SHOULD include the following header, where <size> is the size in bytes (see the blob PATCH definition for usage)

https://github.com/opencontainers/distribution-spec/blob/main/spec.md

Also I think be default we should use the monolitic 2 step push unless the file is large (to also limit the number of HTTP request if not needed)

// Upload first chunk
if (totalBytesRead > 0) {
Map<String, String> headers = Map.of(
"Content-Type",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants

@jonesbusy
Copy link
Collaborator

See

https://github.com/oras-project/oras-py/blob/1790ad7df3c83884f24aabd2ab4b2d97e4c2f3b0/oras/provider.py#L283-L293

Perhaps we should keep using the pushBlob internally.

Then just provide the pushChunk as public method. And if user wants to push large blobs they can just use it

@vaidikcode
Copy link
Contributor Author

vaidikcode commented Mar 3, 2025

I agree. One way could be to:

  • Introduce a threshold constant.
  • Modify the pushLayers function used in pushArtifacts to select between pushBlob and pushChunks based on the file size.
  • Implement the pushChunks function to check the response header for the minimum chunk size.

Or just implement pushChunks and if user wants, they use(as you said)

@jonesbusy
wdyt?
Thanks!

@jonesbusy
Copy link
Collaborator

I agree. One way could be to:

  • Introduce a threshold constant.
  • Modify the pushLayers function used in pushArtifacts to select between pushBlob and pushChunks based on the file size.
  • Implement the pushChunks function to check the response header for the minimum chunk size.

Or just implement pushChunks and if user wants, they use(as you said)

@jonesbusy wdyt? Thanks!

Or just implement pushChunks and if user wants, they use(as you said). Yes correct. Very large artifact can be implemented by pushing blob/layer via chunks and then creating a manifest

@vaidikcode vaidikcode marked this pull request as ready for review March 13, 2025 09:21
@vaidikcode vaidikcode marked this pull request as draft March 13, 2025 11:43
@vaidikcode vaidikcode marked this pull request as ready for review March 13, 2025 11:47
* @param algorithm The supported algorithm
* @return The algorithm string for MessageDigest
*/
private String getMessageDigestAlgorithm(SupportedAlgorithm algorithm) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such util already exist

}

// Helper method to convert bytes to hex
private static String bytesToHex(byte[] bytes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty such this already exist. If not must be moved to DigestUtil

Copy link
Contributor Author

@vaidikcode vaidikcode Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't simply move bytesToHex to DigestUtils and use it because the DigestUtils class appears to be designed as a proper utility class with static methods, yet its package-private access limits its usability.

What was the intended scope for DigestUtils? If it is meant to be a shared utility, making it public would be a cleaner solution than duplicating the functionality in the Registry. What do you think? Is there any other way to achieve the same behavior as bytesToHex in the createDigestString method? @jonesbusy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to check SupportedAlgorithm class. I think your answer is there.

Since this PR start to be quite large I think it's worth to split it in several smaller PR that can support this one.

It would make it easier to review and faster to merge smaller part of code

@vaidikcode vaidikcode mentioned this pull request Mar 20, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants