sdk: retry Watch() call on failure. #311

trusch · 2018-06-11T07:00:06Z

What

indefinitely retry to get a resource client in the watch call
add a mutex to save the informers global from concurrent access
move informer setup into own go-routine

Why

until now all operators will die when they are deployed before their CRD they are managing
informers is a global variable in the sdk package. To make it safe for concurrent use it's now guarded by an mutex
this fixes Operator panics if it starts before creating the CRD #183

*What* * indefinitely retry to get a resource client in the watch call * add a mutex to save the `informers` global from concurrent access * move informer setup into own go-routine *Why* * until now all operators will die when they are deployed before their CRD they are managing * `informers` is a global variable in the sdk package. To make it safe for concurrent use it's now guarded by an mutex fixes operator-framework#183

hasbro17 · 2018-06-15T21:54:34Z

@trusch I don't think we should retry if the Watch fails. The operator should just exit cleanly with an error instead of calling panic like we discussed on the issue #183 (comment)

There are a few problems with retrying like this:

GetResourceClient() has other failure cases besides the CRD not being registered. For e.g the user can make an error by passing in an incorrect parameter (apiVersion, kind, or namespace) to Watch(). There's no point in retrying that.
By retrying a failed Watch in the background you have the problem that the informer will get initialized later but will never Run because we only do a single pass of all initialized informers in sdk.Run(). So any informers created later on by retrying the Watch won't run.
Since the operator is run as a Deployment it will naturally keep retrying if a Watch fails the first time and the operator pod exits cleanly.

So I would suggest we just have the operator exit cleanly so it's visible why the Watch failed and let the Deployment restart the operator.

rithujohn191 · 2018-09-19T21:57:42Z

Closing out this issue. We can re-open it if a need arises.

spahl requested a review from hasbro17 June 15, 2018 18:42

rithujohn191 closed this Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdk: retry Watch() call on failure. #311

sdk: retry Watch() call on failure. #311

trusch commented Jun 11, 2018

hasbro17 commented Jun 15, 2018

rithujohn191 commented Sep 19, 2018

sdk: retry Watch() call on failure. #311

sdk: retry Watch() call on failure. #311

Conversation

trusch commented Jun 11, 2018

hasbro17 commented Jun 15, 2018

rithujohn191 commented Sep 19, 2018