Skip to content

Improve the retry strategy #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
rhamzeh opened this issue Mar 7, 2025 · 0 comments · May be fixed by openfga/sdk-generator#504
Open
1 task done

Improve the retry strategy #208

rhamzeh opened this issue Mar 7, 2025 · 0 comments · May be fixed by openfga/sdk-generator#504
Assignees
Labels
enhancement New feature or request

Comments

@rhamzeh
Copy link
Member

rhamzeh commented Mar 7, 2025

Checklist

Describe the problem you'd like to have solved

Retry-After is a standard header used by APIs to indicate when the SDK can retry.

The SDKs should:

  • Honor this header on 429s
  • Expose this header value in the error when received
  • Fallback to exponential retry when this header is not available (e.g. not sent by the server on 429s or on e.g. 500s)
  • Drop support for retrying based on X-Rate-Limit-Reset (currently only .NET SDK supports that), though still expose it in the logs

Current State

SDK Retries on Default Num Retries Max Num Retries State
JS 429s, 500s 3 15 Does not consider headers. Implements exponential backoff, with the following algorithm

2^loopCount * 100ms and 2^(loopCount + 1) * 100ms

Describe the ideal solution

Retry On

  • Retry on 429s, falling back to exponential backoff
  • Retry on 5xxs (except 501 not implemented), falling back to exponential backoff
  • Retry on network errors, falling back to exponential backoff

Max Allowable Retries

15

Default Number of Retries

SDKs: 3

Retry Parameters

  1. If Retry-After header is found, use it

    1. if it is an integer, treat it as the number of seconds from now to retry, if it is <1 from now or >1800 from now (aka >30 min) - assume it is invalid and continue
    2. if it is a date, parse it but if it is <1 from now or >1800 from now (aka >30 min) - assume it is invalid and continue
  2. If neither header is found, use exponential backoff but we'll add some jitter, so the retry is a random number between

    1. 2^loopCount * 500ms and 2^(loopCount + 1) * 500ms
    2. if the result of (a) is > 120s, cap it at 120s which should happen between the 8th and 9th retry

That means:

  • if retry-after header was returned and is valid, we’ll use it - so if it says in 4 min all good
  • if retry-after header was not returned, we will retry at:
    • 100ms
    • 200ms
    • 400ms
    • 800ms
    • 1.6s
    • 3.2s
    • 6.4s
    • 12.8s
    • 25.6s
    • 51.2s
    • 102.4s
    • 120s ← at this point is is >4min since initial call
    • 120s
    • 120s
    • 120s

Alternatives and current workarounds

No response

References

No response

Additional context

No response

@rhamzeh rhamzeh added the enhancement New feature or request label Mar 7, 2025
@rhamzeh rhamzeh linked a pull request Mar 7, 2025 that will close this issue
4 tasks
@rhamzeh rhamzeh moved this from Backlog to In review in SDKs and Tooling Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In review
Development

Successfully merging a pull request may close this issue.

2 participants