Skip to content

Create OLM upgrade e2e scenario using codeflare SDK #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Srihari1192
Copy link
Contributor

@Srihari1192 Srihari1192 commented Sep 14, 2023

Issue link

#184

What changes have been made

  • Added TestMNISTRayClusterUp and TestMnistJobSubmit in OlM upgrade test to run the test before an operator upgrade and after upgrade
  • Added methods CreateTestNamespaceWithName and DeleteTestNamespace in namespace support class for OLM upgrade tests

Verification steps

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@Srihari1192 Srihari1192 marked this pull request as ready for review September 21, 2023 07:04
@Srihari1192
Copy link
Contributor Author

@sutaakar OLM tests are Failing due to lack of resources in KinD cluster and test are pass in local.. I think we can enable this tests only when Large runners are available

@sutaakar
Copy link
Contributor

@Srihari1192 Last part of the log:

[notice] A new release of pip available: 22.3 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
Written to: mnist.yaml
╭──────────────────────╮
│   🚀 Cluster Queue   │
│      Status 🚀       │
│ +-------+----------+ │
│ | Name  | Status   | │
│ +=======+==========+ │
│ | mnist | queueing | │
│ |       |          | │
│ +-------+----------+ │
╰──────────────────────╯
Waiting for requested resources to be set up...
No instances found, nothing to be done.
Traceback (most recent call last):
  File "raycluster_sdk.py", line 28, in <module>
    cluster.wait_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 273, in wait_ready
    dashboard_ready = self.is_dashboard_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 255, in is_dashboard_ready
    response = requests.get(self.cluster_dashboard_uri(), timeout=5)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant http://None?

Can you confirm whether SDK supports using Ingress? If so, can you check that Ingress is properly created when using KinD with Ingress installed in the test setup?

@Srihari1192
Copy link
Contributor Author

@Srihari1192 Last part of the log:

[notice] A new release of pip available: 22.3 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
Written to: mnist.yaml
╭──────────────────────╮
│   🚀 Cluster Queue   │
│      Status 🚀       │
│ +-------+----------+ │
│ | Name  | Status   | │
│ +=======+==========+ │
│ | mnist | queueing | │
│ |       |          | │
│ +-------+----------+ │
╰──────────────────────╯
Waiting for requested resources to be set up...
No instances found, nothing to be done.
Traceback (most recent call last):
  File "raycluster_sdk.py", line 28, in <module>
    cluster.wait_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 273, in wait_ready
    dashboard_ready = self.is_dashboard_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 255, in is_dashboard_ready
    response = requests.get(self.cluster_dashboard_uri(), timeout=5)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant http://None?

Can you confirm whether SDK supports using Ingress? If so, can you check that Ingress is properly created when using KinD with Ingress installed in the test setup?

Sure

@Srihari1192
Copy link
Contributor Author

@Srihari1192 Last part of the log:

[notice] A new release of pip available: 22.3 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
Written to: mnist.yaml
╭──────────────────────╮
│   🚀 Cluster Queue   │
│      Status 🚀       │
│ +-------+----------+ │
│ | Name  | Status   | │
│ +=======+==========+ │
│ | mnist | queueing | │
│ |       |          | │
│ +-------+----------+ │
╰──────────────────────╯
Waiting for requested resources to be set up...
No instances found, nothing to be done.
Traceback (most recent call last):
  File "raycluster_sdk.py", line 28, in <module>
    cluster.wait_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 273, in wait_ready
    dashboard_ready = self.is_dashboard_ready()
  File "/opt/app-root/lib64/python3.8/site-packages/codeflare_sdk/cluster/cluster.py", line 255, in is_dashboard_ready
    response = requests.get(self.cluster_dashboard_uri(), timeout=5)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/opt/app-root/lib64/python3.8/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant http://None?

Can you confirm whether SDK supports using Ingress? If so, can you check that Ingress is properly created when using KinD with Ingress installed in the test setup?

@sutaakar SDK not supporting Ingress yet.. Implementation is in progress for this project-codeflare/codeflare-sdk#251


test := With(t)
test.T().Parallel()
if os.Getenv("RUN_OLM_TESTS") != "true" {
Copy link
Contributor

@sutaakar sutaakar Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to use build tags - https://stackoverflow.com/questions/54165975/go-test-only-run-tests-that-contain-a-build-tag
That way you can specify the OLM upgrade tests when invoking the tests, i.e. go test -tags olm_upgrade_test ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disadvantage of this approach is that the file won't compile if the tag is not enabled.....
Thinking whether it may be better just to remove the condition and specify what tests to run in makefile - to have a separate command there to run upgrade tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes also if we are adding tags, need to adjust to all the e2e tests with build tag to skip these tests running as part of e2e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sutaakar Probably we can use Test grouping for e2e test run like go test -timeout 30m -v ./test/e2e -run "^TestMNIST.*$" as all our e2e tests starts with TestMNIST and rename OLM upgrade test to TestOLMUpgradeRayClusterUp and TestOLMUpgradeMnistJobSubmit . So that we can call these tests specifically in our workflows by removing the condition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this test can be moved to dedicated folder, i.e. test/upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will go with this approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the tests to folder test/upgrade.. kept test dependent files in the test/e2e as ReadFile method excepts files to be in the same package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking whether it would have sense to copy the method (doesn't have to be exported) to the upgrade package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay left as same as existing

@Srihari1192 Srihari1192 force-pushed the olm-upgrade-e2e-184 branch 2 times, most recently from e7d10ed to 13e88a9 Compare December 4, 2023 11:15
@Srihari1192 Srihari1192 marked this pull request as ready for review December 5, 2023 12:52
@openshift-ci openshift-ci bot requested a review from sutaakar December 5, 2023 12:52
Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sutaakar
Copy link
Contributor

sutaakar commented Dec 7, 2023

@astefanutti do you have any feedback for this PR, or should we merge it?

Comment on lines +48 to +54
defer func() {
if t.Failed() {
DeleteTestNamespace(test, namespace)
} else {
StoreNamespaceLogs(test, namespace)
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Srihari1192 @sutaakar out of curiosity, why not using the "standard" way, where test support does that automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astefanutti In this Upgrade context, we are using the same namespace in after operator upgrade test. As NewTestNamespace will delete the namespace by default after test complete , so we added this supported methods in codeflare-common

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Srihari1192 Thanks, that's clear now.

Two things I could suggest we could lean on in the future:

  • Rely on the options argument of the NewTestNamespace method, to provide the name or prevent deletion for example, instead of creating ad-hoc methods. Options enable to mix things.
  • The test logic seems fragmented between the GH Actions workflow, and the Go tests. It may be better to implement the upgrade as part of the Go test, so it's not necessary to deal with namespace deletion and run tests by name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to implement the upgrade as part of the Go test

This will couple the test with specific upgrade strategy (using OLM, overriding existing deployment with new oneliner, upgrade using ODH). Personally I would prefer keep test implementation aside from deployment/upgrade, to keep the test reusable for any strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Sounds good 👍🏼.

@astefanutti
Copy link
Contributor

/lgtm

@astefanutti
Copy link
Contributor

/approve

Copy link

openshift-ci bot commented Dec 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 8, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 0afa252 into project-codeflare:main Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants