Skip to content

ansible operator: race condition between delete and status update #818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mhrivnak opened this issue Dec 5, 2018 · 4 comments
Closed

ansible operator: race condition between delete and status update #818

mhrivnak opened this issue Dec 5, 2018 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project

Comments

@mhrivnak
Copy link
Member

mhrivnak commented Dec 5, 2018

If a CR gets deleted while ansible is running, and the operator is in its reconcile loop, the ending status update will fail because the object changed underneath it (the deletion timestamp was added, or the object is actually gone). This error will appear in the log:

{
  "level": "error",
  "ts": 1544041930.6005814,
  "logger": "kubebuilder.controller",
  "caller": "controller/controller.go:209",
  "msg": "Reconciler error",
  "Controller": "memcached-controller",
  "Request": "default/example-memcached",
  "error": "Operation cannot be fulfilled on memcacheds.cache.example.com \"example-memcached\": StorageError: invalid object, Code: 4, Key: /registry/cache.example.com/memcacheds/default/example-memcached, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d5fb02f7-f8cc-11e8-bdc8-20765fffbd97, UID in object meta: ",
  "stacktrace": "github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209\ngithub.com/operator-framework/operator-sdk/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\ngithub.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/mhrivnak/golang/src/github.com/operator-framework/operator-sdk/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"
}
@mhrivnak mhrivnak added kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project labels Dec 5, 2018
mhrivnak added a commit to mhrivnak/operator-sdk that referenced this issue Dec 5, 2018
- Fixes bug where reconcile attempted to update status after running finalizer
- Stops depending on grepping for a log statement for a test to pass
- Disables debug log mode by default
- Ensures no errors appear in log at end of e2e test run
- logs cache miss at INFO instead of ERROR
- adds sleep statement to tolerate operator-framework#818 in e2e tests
mhrivnak added a commit to mhrivnak/operator-sdk that referenced this issue Dec 5, 2018
- Fixes bug where reconcile attempted to update status after running finalizer
- Stops depending on grepping for a log statement for a test to pass
- Disables debug log mode by default
- Ensures no errors appear in log at end of e2e test run
- logs cache miss at INFO instead of ERROR
- adds sleep statement to tolerate operator-framework#818 in e2e tests
mhrivnak added a commit to mhrivnak/operator-sdk that referenced this issue Dec 5, 2018
- Fixes bug where reconcile attempted to update status after running finalizer
- Stops depending on grepping for a log statement for a test to pass
- Disables debug log mode by default
- Ensures no errors appear in log at end of e2e test run
- logs cache miss at INFO instead of ERROR
- adds sleep statement to tolerate operator-framework#818 in e2e tests
mhrivnak added a commit to mhrivnak/operator-sdk that referenced this issue Dec 5, 2018
- Fixes bug where reconcile attempted to update status after running finalizer
- Stops depending on grepping for a log statement for a test to pass
- Disables debug log mode by default
- logs cache miss at INFO instead of ERROR
- adds commented-out check for errors in logs in e2e tests, which will be enabled after operator-framework#818 gets fixed
@mhrivnak
Copy link
Member Author

mhrivnak commented Dec 5, 2018

@dymurray we think the work you're doing on status will fix this.

@mhrivnak
Copy link
Member Author

mhrivnak commented Dec 6, 2018

This test should be enabled when this bug is fixed:

## TODO enable when this is fixed: https://github.com/operator-framework/operator-sdk/issues/818
# if kubectl logs deployment/memcached-operator | grep -i error;
# then
# echo FAIL: the operator log includes errors
# kubectl logs deployment/memcached-operator
# exit 1
# fi

fabianvf pushed a commit to fabianvf/operator-sdk that referenced this issue Dec 21, 2018
- Fixes bug where reconcile attempted to update status after running finalizer
- Stops depending on grepping for a log statement for a test to pass
- Disables debug log mode by default
- logs cache miss at INFO instead of ERROR
- adds commented-out check for errors in logs in e2e tests, which will be enabled after operator-framework#818 gets fixed
@dymurray
Copy link
Contributor

dymurray commented Jan 7, 2019

I believe this issue is resolved here: df40e30#diff-9649888b3df1961dad649345545de4c0

@mhrivnak
Copy link
Member Author

mhrivnak commented Jan 7, 2019

I agree that resolved it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project
Projects
None yet
Development

No branches or pull requests

2 participants