Skip to content

Increase Azure client timeout on tests #67210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 13, 2021

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jan 8, 2021

Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on deletion failures.

Closes #67119

Instead of executing all the delete request in parallel
this commits introduces a change that allows the execution
of delete requests in batches of 100 parallel deletions.
The reason for this change is to avoid timeout failures
when large number of files should be deleted as if we
execute all of them in parallel a few slow requests could
make the rest to fail due to timeouts, as there is an effective
limit at the connection pool level.
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
@fcofdez fcofdez added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v7.12.0 labels Jan 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 12, 2021

@original-brownbear would you mind to take a look into this when you have the time? Thanks!

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Francisco this makes sense I think.
My understanding is that the problem we're facing is a client side request timeout because the requests have to wait for so long to actually go out over the limited connections we have.
I wonder if we couldn't just way increase the timeout to work around this instead?
Would keep the code simpler for one but also would overall run faster I guess since we get more parallelism from deleting.

The risk of every now and then failing a bulk delete because of a bunch of slow running deletes isn't so bad IMO, all our repo operations will clean the left-overs up during subsequent delete operations anyway. In the real world it seems very unlikely we'd be seeing what failed the test here anyway since the test timeouts are so absurdly short.
Also, in practice we don't even have a timeout by default anyway for Azure do we so this is a test-only issue for the most part it seems?

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 12, 2021

Thanks for the review Armin!

I wonder if we couldn't just way increase the timeout to work around this instead?

Yes, that should solve the issue. I was a bit worried about the consequences of a bunch of failed delete
requests, but as you point that should be solved during subsequent delete operations.

I'll increase the timeout and keep the improvements around the error messages

@fcofdez fcofdez changed the title Execute azure blob deletions in batches Increase Azure client timeout on tests Jan 12, 2021
@fcofdez fcofdez added >test Issues or PRs that are addressing/adding tests and removed >bug labels Jan 12, 2021
Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks Francisco!

@fcofdez fcofdez merged commit 4b9f2e9 into elastic:master Jan 13, 2021
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
Backport of elastic#67210
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
Backport of elastic#67210
fcofdez added a commit that referenced this pull request Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
Backport of #67210
fcofdez added a commit that referenced this pull request Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
Backport of #67210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v7.11.0 v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EncryptedAzureBlobStoreRepositoryIntegTests.testLargeBlobCountDeletion failed to delete directory
4 participants