End 2 End tests speedup #1180

Jan-M · 2020-10-18T17:29:35Z

The current implementation of our end to end tests is very slow due to a lot of hard coded waits and default waits.

This PR introduces eventual wrappers around assertEqual and assertTrue to speedup the watch process for operator actions. These functions repeat the requests multiple times and either succeed once the result is correct or fail after max intervals.

This PR also increases resource limits for pods, in case an environment applies CPU limits.

This PR also sets a discovery folder for unittest to tests.

More work needs to be done in the future to assess all end 2 end test cases to properly wait/retry.

… timeouts.

* Lazy upgrade now properly covered with eventual and waiting for pod start * patching config now updates deployment, patching annotation, allowing to trace change step * run.sh no takes NOCLEANUP to stop kind from being deleted * if kind config is present, run will not install kind * Fast e2e local execution now possible once kind is up

e2e/run_tests_image.sh

e2e/tests/test_e2e.py

sdudoladov · 2020-10-20T04:45:55Z

e2e/tests/test_e2e.py

-        # HACK operator must register CRD and/or Sync existing PG clusters after start up
-        # for local execution ~ 10 seconds suffices
-        time.sleep(60)
+        self. wait_for_pod_start("name=postgres-operator")        


And how do we know then at the line 109 that it is safe to create PG clusters ? By re-creating a PG cluster until it succeeds ?

you worry the operator is not running? and then initial manifest create is not picked up?

it might happen that the PG CRD is not yet registered. There is a very short period of time when an operator pod has already started by the CRD is not yet registered. That sometimed led to the failure of cluster creation during the initial development of e2e tests

e2e/tests/test_e2e.py

* Allow Docker image to take parameters to overwrite unittest execution * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case

…ble on startup

…e roles.

…bject diffs to text diff. Enabled padding for log level.

sdudoladov · 2020-10-27T12:02:15Z

pkg/util/config/config.go

@@ -199,7 +199,7 @@ type Config struct {

 // MustMarshal marshals the config or panics
 func (c Config) MustMarshal() string {
-	b, err := json.MarshalIndent(c, "", "\t")
+	b, err := json.MarshalIndent(c, "", "   ")


why ? imo \t is more readable

\t gets serialized to \t i think so makes it not readable, you can try after PR to change it.

sdudoladov · 2020-10-27T12:10:10Z

pkg/util/nicediff/diff.go

+// limitations under the License.
+
+// Package diff implements a linewise diff algorithm.
+package nicediff


Is it godebug copied verbatim ?

sdudoladov · 2020-10-27T12:11:55Z

pkg/controller/controller.go

+
+	// disabling the sending of events also to the logoutput
+	// the operator currently duplicates a lot of log entries with this setup
+	// eventBroadcaster.StartLogging(logger.Infof)


dead code ?

sdudoladov · 2020-10-27T12:20:46Z

pkg/cluster/util.go

+	n, errn := json.MarshalIndent(new, "", "  ")
+
+	if erro != nil || errn != nil {
+		panic("could not marschal API objects, should not happen")


a typo: marsChal is actually marshal

sdudoladov · 2020-10-27T12:30:47Z

pkg/cluster/sync.go


 		cmp := c.compareStatefulSetWith(desiredSS)
 		if !cmp.match {
 			if cmp.rollingUpdate && !podsRollingUpdateRequired {
 				podsRollingUpdateRequired = true
-				c.setRollingUpdateFlagForStatefulSet(desiredSS, podsRollingUpdateRequired)
+				c.setRollingUpdateFlagForStatefulSet(desiredSS, podsRollingUpdateRequired, "statefulset changes")


"statefulset changes" is not particularly descriptive message. Rolling upgrade happens exactly because something in the stateful set template changes. What is intended to be described here ?

I think this is one of three places i added just the reason, the other 2 were cache or existing annotation. Yes could be better message.

sdudoladov · 2020-10-27T12:40:54Z

e2e/README.md

+
+After having executed a normal E2E run with `NOCLEANUP=True` Kind still continues to run, allowing you subsequent test runs.
+
+To run an individual test, run the following command in the `e2e` directory


This may or may not work depending on the state in which the last test to run leaves the cluster.
It is assumed that individual unit tests will properly clean up after themselves, but that is not given.

Yes, you are correct. This does not work when stuff is broken. But we cant fix it all now. Future work to have e2e tests not be so mixed up and depending on each other.

Most are easy, bit more annoying to fix are those nodes label and toleration changes.

sdudoladov · 2020-10-27T12:43:26Z

e2e/scripts/get_logs.sh

@@ -0,0 +1,2 @@
+#!/bin/bash
+kubectl logs $(kubectl get pods -l name=postgres-operator --field-selector status.phase=Running -o jsonpath='{.items..metadata.name}')


can that command be wrapped into a function so that

source e2e/scripts/get_logs.sh

would work ?

sdudoladov · 2020-10-27T12:46:22Z

e2e/scripts/cleanup.sh

+kubectl delete statefulsets -l application=spilo,cluster-name=acid-minimal-cluster
+kubectl delete services -l application=spilo,cluster-name=acid-minimal-cluster
+kubectl delete configmap postgres-operator
+kubectl delete deployment postgres-operator


This may be subject to a race condition where an operator is deleted earlier that it completes the clean up. That happened earlier both in this e2e pipeline and the run_operator_locally script.

sdudoladov · 2020-10-27T12:51:56Z

e2e/exec_into_env.sh

@@ -0,0 +1,14 @@
+#!/bin/bash
+
+export cluster_name="postgres-operator-e2e-tests"


why does a docker run wrapper export anything to the env ?

sdudoladov · 2020-10-27T12:53:24Z

pkg/cluster/cluster.go

@@ -595,7 +595,7 @@ func (c *Cluster) enforceMinResourceLimits(spec *acidv1.PostgresSpec) error {
 			return fmt.Errorf("could not compare defined memory limit %s with configured minimum value %s: %v", memoryLimit, minMemoryLimit, err)
 		}
 		if isSmaller {
-			c.logger.Warningf("defined memory limit %s is below required minimum %s and will be set to it", memoryLimit, minMemoryLimit)
+			c.logger.Warningf("defined memory limit %s is below required minimum %s and will be increase", memoryLimit, minMemoryLimit)


a typo: increaseD

Jan-M · 2020-10-27T16:27:08Z

👍

sdudoladov · 2020-10-28T08:53:16Z

👍

Jan-M · 2020-10-28T09:01:17Z

👍

* Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts. * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case * Printing config as multi log line entity, makes it readable and grepable on startup * Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level. * Mount script with tools for easy logaccess and watching objects. * Set proper update strategy for Postgres operator deployment. * Move long running test to end. Move pooler test to new functions. * Remove quote from valid K8s identifiers.

This reverts commit 125f6dd.

Improving end 2 end tests by implementing proper eventual asserts and…

21afc07

… timeouts.

Jan-M added enhancement in-progress zalando labels Oct 18, 2020

Jan-M requested review from avaczi, CyberDem0n, erthalion, FxKu, RafiaSabih and sdudoladov as code owners October 18, 2020 17:29

Jan-M added 6 commits October 18, 2020 19:56

Build docker image and changed back to os image.

cc4bfb0

Loadbalancer test now uses eventualEqual properly.

38e6261

More fixes for e2e tests.

ccde8c6

WIP

c1ad716

Fix distribution call.

4fc8ca3

sdudoladov reviewed Oct 20, 2020

View reviewed changes

Jan-M added 2 commits October 20, 2020 19:20

* Make lazy upgrade test work reliable

c6c4c4c

* Allow Docker image to take parameters to overwrite unittest execution * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case

Printing config as multi log line entity, makes it readable and grepa…

668ef51

…ble on startup

FxKu added this to the 1.6 milestone Oct 21, 2020

Jan-M added 10 commits October 21, 2020 15:23

Progressing on faster e2e tests.

2066256

Extending timeout, allow one sync.

9b596f1

Fix min resurces end to end test.

f03409d

Fixing annotations key.

39641e8

More e2e changes for scale up and down.

6b91bd3

Comments updated.

b422cf9

Move scale function.

e40abdb

More tests and more nice diff.

1f3730b

Minor changes around running pods and catching error in infrastructur…

2aeaad0

…e roles.

Mostly cosmetic changes to logs. Removed quotes from diff. Move all o…

0143a47

…bject diffs to text diff. Enabled padding for log level.

Jan-M added 10 commits October 26, 2020 21:51

Show deployments too.

a8c777a

Pooler cleanup in spec in extra step.

acc1d5e

Changing e2e for pooler a bit. Check annotations object before lookup.

a53280e

Fix annotation error case.

89741c4

Verify explicit sync of deployment.

88e8995

Tiny changes.

eb8df06

Taints and tolerations test.

e6b71cb

Changes to tolerations test. Make it complete quicker.

8dc6c08

Fix funciton in wrong class.

d2599d9

Typos.

826d7c0

sdudoladov reviewed Oct 27, 2020

View reviewed changes

Jan-M added 5 commits October 27, 2020 14:14

Proper f() wrapper for taint test.

326c67b

Wait for pods to run.

b606b6f

Skip failing test for now.

60cbd4e

Adress typos.

24a2a62

Merge master.

9a7fc85

Jan-M added 8 commits October 27, 2020 17:57

Fix missing pieces from K8s API.

30ddfc9

Rename cleanup. Add to readme.

474d4d9

Catch possible pods count error.

cabb7bc

Giving operator 1 second to startup.

30cd4ed

tiny change to log message.

e3f32d2

No need to quote valid K8s identifiers.

d24c128

Adding message to verif manifest was in fact deleted.

067c7b5

Wait more before we delete.

85ae41b

Jan-M merged commit 3a86dfc into master Oct 28, 2020

PetterSa added a commit to PetterSa/postgres-operator that referenced this pull request Nov 19, 2020

Revert "End 2 End tests speedup (zalando#1180)"

dfa8be4

This reverts commit 125f6dd.


		After having executed a normal E2E run with `NOCLEANUP=True` Kind still continues to run, allowing you subsequent test runs.

		To run an individual test, run the following command in the `e2e` directory

		@@ -0,0 +1,2 @@
		#!/bin/bash
		kubectl logs $(kubectl get pods -l name=postgres-operator --field-selector status.phase=Running -o jsonpath='{.items..metadata.name}')

		@@ -0,0 +1,14 @@
		#!/bin/bash

		export cluster_name="postgres-operator-e2e-tests"

End 2 End tests speedup #1180

End 2 End tests speedup #1180

Uh oh!

Conversation

Jan-M commented Oct 18, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-M commented Oct 27, 2020

Uh oh!

sdudoladov commented Oct 28, 2020

Uh oh!

Jan-M commented Oct 28, 2020

Uh oh!

Uh oh!