-
Notifications
You must be signed in to change notification settings - Fork 609
Fix for PGO-2380: #4163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for PGO-2380: #4163
Conversation
Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.
2. Add a backups spec to the new cluster and ensure that pgbackrest is added to the instance pod, a repo-host pod is created, and the collector runs on both pods. | ||
3. Remove the backups spec from the new cluster. | ||
4. Annotate the cluster to allow backups to be removed. | ||
5. Ensure that the repo-host pod is destroyed, pgbackrest is removed from the instance pod, and the collector continues to run on the instance pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious -- if you skip steps 3 and 4, will 5 fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow... If you don't remove the backups spec and annotate the cluster, pgbackrest will just continue to run (in the instance pod and repo-host)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -- I'm wondering if step 5 tests what we want it to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, first we create a brand new cluster that does not have backups enabled. And we make sure that the collector runs in that scenario. Then we add backups and ensure that everything works properly with that transition. Then we remove backups and ensure that everything works properly with that transition... What scenario do you think we aren't testing? Or what is it about the test that is insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I did a KUTTL test, but I remember running into issues where I asserted that X existed, but it actually ignored what wasn't X. So here, we assert there's a few containers, but not the collector container -- but will it fail if that collector container does exist? Or will it just skip that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I think I understand now... In the assert step we tell it to check the instance pod for specific containers and it will fail if the list is not exactly correct (if we ask about 3 containers, but there are actually 5 containers that include the 3 we ask about, it will fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it checks if the list (of containers, etc.) is complete still? I think we (and others) were pushing to get it to check only for the presence of the items in the list.
|
||
pod=$(kubectl get pods -o name -n "${NAMESPACE}" \ | ||
-l postgres-operator.crunchydata.com/cluster=otel-cluster-no-backups,postgres-operator.crunchydata.com/data=postgres) | ||
[ "$pod" = "" ] && retry "Pod not found" && exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ What is the retry mechanism? It looks like the retry function just prints a message and sleeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, "retry" is a bit of a misnomer -- Joseph and I talked about this when he introduced the mechanism. As you point out, the retry
just print and sleeps; the actual retrying is because
(a) this exits 1 and
(b) it's a TestAssert
In KUTTL, failed TestAsserts automatically retry a number of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
…add the instrumentation spec to PGAdmin.
Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.
Checklist:
Type of Changes:
What is the current behavior (link to any open issues here)?
At the moment, we cannot run the collector on a postgres instance pod when backups are disabled as the pod will fail due to it attempting to mount a file that does not exist.
What is the new behavior (if this is a feature change)?
We only mount logrotate config files for the instance pod when backups are enabled, which allows the collector to run whether backups are enabled or not.
Other Information: