-
Notifications
You must be signed in to change notification settings - Fork 3k
Core decompress body #18581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core decompress body #18581
Conversation
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: What if I am onboarding a new service?Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment: |
/azp run python - translation - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - translation - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - search - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - keyvault - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - tables - tests |
/azp run python - translation - tests |
/azp run python - search - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
I suspect that @iscai-msft is (or should be) interested in this discussion. |
try: | ||
decoded = content.decode('utf-8') | ||
assert False | ||
except UnicodeDecodeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this raises because we couldn't decompress because no header way found - is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is no encoding header, we will not try to decompress it.
Here raises error because it fails to decode a compressed stream.
request = client.get(url) | ||
pipeline_response = await client._pipeline.run(request, stream=True) | ||
response = pipeline_response.http_response | ||
data = response.stream_download(client._pipeline, decompress=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the header that being returned here?
Is it raising because the header says it's gzip, but the content itself doesn't match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
In this scenario, there is an encoding header and we pass in decompress=True. We will try to decompress the stream which is not in correct format. So the decompression will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this failing because the decompression algorithm mismatches the header? Or because the content itself mismatches the header?
I'm wondering because it's a zlib error, but the test name indicates a 'plain' content header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With encoding header "gzip" we will try to use "gzip" algorithm to decompress the stream.
But the content of the stream itself is not compressed (it is plain text).
What happens here is we try to decompress an un-compressed stream so we fail to decompress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha - perfect!
python - autorest - pr failure is known. |
/azp run python - autorest - pr |
Azure Pipelines successfully started running 1 pipeline(s). |
try: | ||
auto_decompress = self.session.auto_decompress # type: ignore | ||
except AttributeError: | ||
auto_decompress = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment as to why we need this. I know I would be confused unless I knew the history...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
zlib_mode = 16 + zlib.MAX_WBITS if enc == "gzip" else zlib.MAX_WBITS | ||
decompressor = zlib.decompressobj(wbits=zlib_mode) | ||
body = decompressor.decompress(self._body) | ||
return body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have some concerns about us not caching the decompressed body. Because we only need it once, right? Do we have any other access to self._body
that requires us to keep the compressed data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect (as least I did not see) users need to get body twice.
If you want, we can update the code like:
if enc in ("gzip", "deflate"):
if self._decompressed_body:
return self._decompressed_body
import zlib
zlib_mode = 16 + zlib.MAX_WBITS if enc == "gzip" else zlib.MAX_WBITS
decompressor = zlib.decompressobj(wbits=zlib_mode)
self._decompressed_body = decompressor.decompress(self._body)
return self._decompressed_body
return self._body
But to be honest, I don't see lots of value for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The don't need to get the body more than once. And it would not be clear to me that getting the body and then the text will decompress the body twice.
I don't think we need to keep the compressed data around once it has been decompressed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair. Updated. :)
zlib_mode = 16 + zlib.MAX_WBITS if enc == "gzip" else zlib.MAX_WBITS | ||
decompressor = zlib.decompressobj(wbits=zlib_mode) | ||
self._decompressed_body = decompressor.decompress(self._body) | ||
return self._decompressed_body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping a single copy of the body around. Unless you still need the compressed version for some reason...
No description provided.