Fix for PGO-2380: #4163

dsessler7 · 2025-04-24T00:05:00Z

Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.

Checklist:

Have you added an explanation of what your changes do and why you'd like them to be included?
Have you updated or added documentation for the change, as applicable?
Have you tested your changes on all related environments with successful results, as applicable?
- Have you added automated tests?

Type of Changes:

What is the current behavior (link to any open issues here)?

At the moment, we cannot run the collector on a postgres instance pod when backups are disabled as the pod will fail due to it attempting to mount a file that does not exist.

What is the new behavior (if this is a feature change)?

Breaking change (fix or feature that would cause existing functionality to change)

We only mount logrotate config files for the instance pod when backups are enabled, which allows the collector to run whether backups are enabled or not.

Other Information:

Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.

benjaminjb · 2025-04-24T13:26:38Z

testing/kuttl/e2e/otel-logging-and-metrics/README.md

+    2. Add a backups spec to the new cluster and ensure that pgbackrest is added to the instance pod, a repo-host pod is created, and the collector runs on both pods.
+    3. Remove the backups spec from the new cluster.
+    4. Annotate the cluster to allow backups to be removed.
+    5. Ensure that the repo-host pod is destroyed, pgbackrest is removed from the instance pod, and the collector continues to run on the instance pod.


I am curious -- if you skip steps 3 and 4, will 5 fail?

Not sure I follow... If you don't remove the backups spec and annotate the cluster, pgbackrest will just continue to run (in the instance pod and repo-host)...

Right -- I'm wondering if step 5 tests what we want it to test.

So, first we create a brand new cluster that does not have backups enabled. And we make sure that the collector runs in that scenario. Then we add backups and ensure that everything works properly with that transition. Then we remove backups and ensure that everything works properly with that transition... What scenario do you think we aren't testing? Or what is it about the test that is insufficient?

It's been a while since I did a KUTTL test, but I remember running into issues where I asserted that X existed, but it actually ignored what wasn't X. So here, we assert there's a few containers, but not the collector container -- but will it fail if that collector container does exist? Or will it just skip that?

Ahh, I think I understand now... In the assert step we tell it to check the instance pod for specific containers and it will fail if the list is not exactly correct (if we ask about 3 containers, but there are actually 5 containers that include the 3 we ask about, it will fail).

Ah, it checks if the list (of containers, etc.) is complete still? I think we (and others) were pushing to get it to check only for the presence of the items in the list.

tony-landreth · 2025-04-24T14:33:35Z

testing/kuttl/e2e/otel-logging-and-metrics/13-assert-instance.yaml

+
+    pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
+      -l postgres-operator.crunchydata.com/cluster=otel-cluster-no-backups,postgres-operator.crunchydata.com/data=postgres)
+    [ "$pod" = "" ] && retry "Pod not found" && exit 1


❓ What is the retry mechanism? It looks like the retry function just prints a message and sleeps.

Yeah, "retry" is a bit of a misnomer -- Joseph and I talked about this when he introduced the mechanism. As you point out, the retry just print and sleeps; the actual retrying is because
(a) this exits 1 and
(b) it's a TestAssert
In KUTTL, failed TestAsserts automatically retry a number of times.

Gotcha. Thanks!

tony-landreth

Looks good!

…add the instrumentation spec to PGAdmin.

Fix for PGO-2380:

d1cdcf4

Only add logrotate volume mounts to instance pod when backups are enabled. Add kuttl tests to ensure that collector will run on postgres instance when backups are disabled.

dsessler7 requested review from benjaminjb, cbandy and tony-landreth April 24, 2025 00:05

benjaminjb reviewed Apr 24, 2025

View reviewed changes

tony-landreth reviewed Apr 24, 2025

View reviewed changes

dsessler7 requested review from benjaminjb and tony-landreth April 24, 2025 20:25

tony-landreth approved these changes Apr 24, 2025

View reviewed changes

benjaminjb approved these changes Apr 25, 2025

View reviewed changes

Reorder otel kuttl test so that we check pgadmin logs right after we …

75d836c

…add the instrumentation spec to PGAdmin.

tony-landreth approved these changes Apr 25, 2025

View reviewed changes

dsessler7 merged commit 465df26 into CrunchyData:main Apr 25, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for PGO-2380: #4163

Fix for PGO-2380: #4163

dsessler7 commented Apr 24, 2025

benjaminjb Apr 24, 2025

dsessler7 Apr 24, 2025

benjaminjb Apr 24, 2025

dsessler7 Apr 24, 2025

benjaminjb Apr 24, 2025

dsessler7 Apr 24, 2025

benjaminjb Apr 25, 2025

tony-landreth Apr 24, 2025 •

edited

Loading

benjaminjb Apr 24, 2025

tony-landreth Apr 24, 2025

tony-landreth left a comment

Fix for PGO-2380: #4163

Fix for PGO-2380: #4163

Conversation

dsessler7 commented Apr 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tony-landreth Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tony-landreth left a comment

Choose a reason for hiding this comment

tony-landreth Apr 24, 2025 •

edited

Loading