status | title | creation-date | last-updated | authors | ||
---|---|---|---|---|---|---|
implemented |
Support retries for custom task in a pipeline. |
2021-05-31 |
2021-12-15 |
|
- Summary
- Motivation
- Requirements
- Proposal
- Design Details
- Test Plan
- Design Evaluation
- Drawbacks
- Alternatives
- Infrastructure Needed (optional)
- Upgrade & Migration Strategy (optional)
- Implementation Pull request(s)
- References (optional)
A pipeline task can be configured with a retries
count, this is
currently only supported for TaskRun
s and not Run
s (i.e. custom tasks).
This TEP is about, a pipeline task can be configured with a retries
count
for Custom tasks.
Also, a PipelineRun
already manages a retry for regular task
by updating its status. However, for custom task, a tekton owned controller
can signal a custom task controller, to retry. A custom task controller may
optionally support it.
Allow custom tasks to be configured with task.retries
Currently, a custom task controller has to develop its own retries support,
which is not configurable as a pipeline task. It is true that not every
custom task need to support retries
. For those who do want to support have to
build their own solutions.
There is no way to view retries information at the pipeline run level.
In addition to building their own solutions, there is lack of uniformity in each custom task controller way of retries. This TEP will bring in standard/uniform way of supporting retry amongst custom controllers.
As a side benefit, a custom task controller - developer SDK, might also benefit from this support, in the future, for example it can include documentation and stub code to make it easy how to support it.
- Support propagating
pipelineSpec.task.retries
count information to custom-task controllers. - Support updating the status of retry history to
tektoncd pipeline
controller. - Gracefully handle the case where, custom controller does not support retry
and yet the
PipelineRun
happens to be configured with retry. This also implies, an existing controller should not mis-behave if it is not upgraded to support retries.
- Directly, force update the
status.conditions
of a custom task.
PipelineTask
can be configured with a retry, validation fails if we configureretries
for custom-task inside apipelineTask
. So, fixing a missing API. Just as we have timeout support for custom-task, we can haveretry
as well.- In
Kubeflow
pipelines with tekton backend, we generatetekton
pipelines from user provided python-dsl (https://github.com/kubeflow/kfp-tekton). Ifretry
field is present at the Pipeline level, then we do not need to know if each task supports retry field or not. Otherwise, it can be hard to determine which custom task support it. - In
PipelineLoop
controller, we would like to optimise retry by examining the failed state. e.g. 2 out of 5 loops were not successful, and we would like to retry only the failed iterations. - A
PipelineRun
sees a custom task as running, even though it may be failing and retrying. An end user, cannot know the status of aPipelineRun
unless they drill down the status of each custom task e.g. if they are viewing their Pipeline progress on UI.
None.
Requesting API changes:
- Add field
Retries
toRunSpec
, an integer count which is communicated to custom task controller. - Add a field
RetriesStatus
toRunStatusFields
, to maintain the retry history for aRun
, similar tov1beta1.TaskRunStatusFields.RetriesStatus
This field is updated by the custom task controller.
A pipeline task may be configured with a timeout, and the timeout includes time required to perform all the retries.
None.
Add an optional Retries
field of type int
to RunSpec
.
Add optional RetriesStatus
field to RunStatusFields
of type []RunStatus
.
A custom task controller can optionally support retry, and can honor the retries
count, and update the RetriesStatus
on each retry.
The TEP introduces new API fields and copy retries count from PipelineRun to the Run. Add/upgrade a test to verify this is correctly copied.
-
Create a fresh
Run
for each retry. This approach does not give the custom task controller to optimise between the Runs. e.g. a Loop controller, would want to retry only the failed iterations by keeping a track of them. If it gets a newRun
for each retry, it may not be able to optimise that. -
The
tektoncd pipeline
controller handle the retry logic and then it signals custom controller each time it has to retry. It maintains the complete history of all the retries performed. Downside of this approach is,- there is a sense of strong coupling between
custom task controller and
tektoncd pipeline
controller. tektoncd pipeline
controller updates the status of aRun
.
- there is a sense of strong coupling between
custom task controller and
An upgrade strategy for existing custom controllers,
- Custom controller already supports a retry field.
- It can deprecate the existing retry field and refer to
Run.spec.retries
. - Update the status at
RunStatusFields.RetriesStatus
ofRunStatus
.
- It can deprecate the existing retry field and refer to
- If custom-task does not already support retry its functioning otherwise should not be impacted.