Skip to content

Fix for PGO-2380: #4163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 25, 2025
Merged

Conversation

dsessler7
Copy link
Contributor

Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?
    • Have you added automated tests?

Type of Changes:

  • New feature
  • Bug fix
  • Documentation
  • Testing enhancement
  • Other

What is the current behavior (link to any open issues here)?

At the moment, we cannot run the collector on a postgres instance pod when backups are disabled as the pod will fail due to it attempting to mount a file that does not exist.

What is the new behavior (if this is a feature change)?

  • Breaking change (fix or feature that would cause existing functionality to change)

We only mount logrotate config files for the instance pod when backups are enabled, which allows the collector to run whether backups are enabled or not.

Other Information:

Only add logrotate volume mounts to instance pod when backups are enabled.
Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.
2. Add a backups spec to the new cluster and ensure that pgbackrest is added to the instance pod, a repo-host pod is created, and the collector runs on both pods.
3. Remove the backups spec from the new cluster.
4. Annotate the cluster to allow backups to be removed.
5. Ensure that the repo-host pod is destroyed, pgbackrest is removed from the instance pod, and the collector continues to run on the instance pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious -- if you skip steps 3 and 4, will 5 fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow... If you don't remove the backups spec and annotate the cluster, pgbackrest will just continue to run (in the instance pod and repo-host)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -- I'm wondering if step 5 tests what we want it to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, first we create a brand new cluster that does not have backups enabled. And we make sure that the collector runs in that scenario. Then we add backups and ensure that everything works properly with that transition. Then we remove backups and ensure that everything works properly with that transition... What scenario do you think we aren't testing? Or what is it about the test that is insufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I did a KUTTL test, but I remember running into issues where I asserted that X existed, but it actually ignored what wasn't X. So here, we assert there's a few containers, but not the collector container -- but will it fail if that collector container does exist? Or will it just skip that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I think I understand now... In the assert step we tell it to check the instance pod for specific containers and it will fail if the list is not exactly correct (if we ask about 3 containers, but there are actually 5 containers that include the 3 we ask about, it will fail).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it checks if the list (of containers, etc.) is complete still? I think we (and others) were pushing to get it to check only for the presence of the items in the list.


pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster-no-backups,postgres-operator.crunchydata.com/data=postgres)
[ "$pod" = "" ] && retry "Pod not found" && exit 1
Copy link
Contributor

@tony-landreth tony-landreth Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What is the retry mechanism? It looks like the retry function just prints a message and sleeps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, "retry" is a bit of a misnomer -- Joseph and I talked about this when he introduced the mechanism. As you point out, the retry just print and sleeps; the actual retrying is because
(a) this exits 1 and
(b) it's a TestAssert
In KUTTL, failed TestAsserts automatically retry a number of times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks!

Copy link
Contributor

@tony-landreth tony-landreth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dsessler7 dsessler7 merged commit 465df26 into CrunchyData:main Apr 25, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants