-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Proposal: User Defined Tests for the Operator Scorecard #1049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: User Defined Tests for the Operator Scorecard #1049
Conversation
// Modifications specifies a spec field to change in the CR with the expected results | ||
type Modification struct { | ||
// a map of the spec fields to modify | ||
Spec map[string]interface{} `mapstructure:"spec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably tests that involve changes outside of the spec
(e.g. changing annotations). Should we account for that type of thing?
Also, defining a modification this way has some drawbacks, but I think it works for most use cases.
A few examples of when it doesn't work well are for modifying slices and removing keys. When a modified value is a slice, the entire modified slice has to be included in the modification. Also, it isn't possible to remove a key, only set it to nil
. Removing a key might be necessary when there's a semantic difference between not present and nil
.
For non-simple use cases, we may want an alternative/additional way to specify modifications. Perhaps the YAML-equivalent of JSON patch. An implementation for YAML exists (https://github.com/krishicks/yaml-patch) if that's something we want to look into.
Somewhat tangentially, another benefit of JSON patch is that there's a test
operation that may be useful for implementing the checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a JSON patch for the modification sounds like a good idea. It's a standard and is quite simple to understand.
Do you think it would make sense to add a JSON patch var to the modification struct and support both using Spec
and JSON patches, or do you think we should only support 1 method for simplicity?
Also, do you you think it would be a good idea to add JSON patch test
operation support to the expected_resource
and expected_status
types? That could make it easier for some test cases, especially ones involving arrays, where the current proposal would need to do something like this to get an item in the second element of an array:
spec:
containers:
-
- image: my-image
vs
json_patch: |
{"op":"test","path":"/spec/containers/1/image","value":"my-image"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for supporting JSON patch fields to specify both modifications and the expected resources/status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about the current map-style modifications and expectations, the more I think we should support only JSON patch for both modifications and expected resources/status.
If we leave the map-style declarations, we would need to define what method we use to merge and compare lists, maps, strings, numbers, etc, and I think users would end up spending a decent chunk of time troubleshooting why certain things are merging or being compared in certain ways.
If we support only JSON patch, I think the semantics are much more obvious and easier to reason about, both from our perspective as maintainers and from the perspective of someone writing a test. And if we're using YAML, I'd suggest allowing the JSON patch to be defined in YAML.
Here's what's in my head. Feel free to throw darts.
userDefinedTests:
- name: Scaling MyApp
description: Tests that MyApp properly scales from 1 to 3
timeout: 1m
setup:
# In addition to `crPath`, maybe we could also allow `cr`
# and have the full CR embedded in the test? We could
# always add that later.
crPath: ./path/to/my/cr.yaml
# Would we an expected map here
# to wait for the initial CR to be setup?
expected:
cr:
tests:
- op: test
path: /status/conditions[0]/type
value: Ready
- op: test
path: /status/conditions[0]/status
value: true
resources:
- group: apps
version: v1
kind: Deployment
name: my-example-deployment
tests:
- op: test
path: /spec/replicas
value: 1
crModifications:
- op: replace
path: spec/count
value: 3
expected:
cr:
tests:
- op: test
path: /status/conditions[0]/type
value: Ready
- op: test
path: /status/conditions[0]/status
value: true
resources:
- group: apps
version: v1
kind: Deployment
name: my-example-deployment
tests:
- op: test
path: /spec/replicas
value: 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the map-style declaration would cause many issues when it comes to how we compare things. There are only 3 non-map/non-array values that you can have in go-yaml/json: number (which can be int/int64/float64; we can just convert any of these types to float64 before comparison), string, and bool. When it comes to maps and array, we just walk down them, not really any different than a JSONPatch would. The main benefit of JSONPatch is that we can more easily define array indices and we can also delete values.
The main benefit of keeping a map-style declaration is that we can have the scorecard functions like the array length checker (another potential example would be a regex matcher). We can't implement that with JSONPatch. We also get to keep the same structure as the actual kubernetes resources, which may be a bit clearer for users.
After some discussion, we have decided that the user defined scorecard tests will instead be run as plugins instead of having a built-in testing system based on the YAML definitions. The new User-Defined Tests for Scorecard will instead allow users to run their own scripts or binaries that will take a standard input (or environment variables; this is an implementation detail that I will investigate more later) and produce a standard JSON output that can be parsed by the scorecard. This will allow users to run tests written with the Go test framework, Ansible Molecule, or any other framework that the users want. This will simplify the actual implementation in the scorecard while allowing more flexibility for users. The YAML based testing described in this proposal will still be implemented, but will be available as a plugin that the scorecard can use. It will be a simple testing method and good starting point for operator writers before they start to design more complex end-to-end tests. Scorecard integration support will also be added to the existing Go test framework. I will be updating the proposal later today (possibly tomorrow) to reflect this new design decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the plugin system, but curious why can't we just have the plugin system alone and user can omit the script? Just an idea, sorry if this question was already answered I wasn't in the meeting. Thanks!
Couple of questions, but overall sgtm 👍
|
||
In order to increase the flexibility of the user defined tests and allow users to implement more complex E2E style tests for scorecard, | ||
the user-defined tests will be implemented via a plugin system. Users will specify the path of the script they wish to run as well | ||
as environment variable to set for the command. The command would then print out the result as JSON to stdout. Here is an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify the path of the script they wish to run as well
This would be done via a CLI flag? Or how would that look like, if I want to run this via the plugin system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be specified through the scorecard's config file. We would just be adding another section to the config file (as specified below) that we can read with viper.
|
||
### Basic YAML Defined Test | ||
|
||
A new basic testing system would be added where a user can simply define various aspects of a test. For example, this definition runs a similar test to the memcached-operator scale test from the SDK's e2e test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The yaml file below would need to be in a predefined directory or passed to the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config path would be set via an environment variable (see the user_defined_tests config example below)
|
||
```yaml | ||
user_defined_tests: | ||
- path: "scorecard/simple-scorecard.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be an example of the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we have a command called simple-scorecard
, we could do:
#!/bin/bash
simple-scorecard --config $CONFIG_FILE
For the second example I have here, which would be a modified version of our test framework's tests to support the JSON output, we could do something like this:
#!/bin/bash
operator-sdk test local $TEST_DIR --enable-scorecard $ENABLE_SCORECARD --namespaced-manifest $NAMESPACED_MANIFEST --go-test-flags $GO_TEST_FLAGS
The scripts are intended to be just a simple wrappers, but the scripts could do more complex things based on the environment variables if we want them to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
Below is an example of what the JSON output of a test would look like: | ||
|
||
```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this to more certain things
- hard failures. Did the test fail in a way that nothing else should be considered
- I assume that once plugin could have more than 1 test. I would like to see a summary struct and a list of test results for the individual runs. I think this will be more flexible.
I would also like this to be the output of the scorecard test. I prefer computer readable structured data to the current output. Maybe have a -h
option that prints out as a human-readable format, but the default is a computer readable makes more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1: What would be a good way to handle that? Add another field to the JSON output? Or maybe check the return code (if command returned 0 use JSON output; if command returned not 0, assume hard fail)?
For 2: In this example JSON I'm showing 2 tests from a single plugin: "actions_reflected_in_status" and "my_custom_tests".
I think someone was asking if we could have JSON output for the scorecard itself. I thought I made an issue for that, but it looks like I didn't. I'll create a new issue to track that. It'll probably be pretty much the same output as the example plugin output I have here (the plugin output example is just a JSONified array of TestResult objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider a different structure. I think that something that gives someone reading this output easy path to see if everything is good, but also can deep dive into the results if something goes wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to recap for the updated proposal: To indicate a hard failure(e.g error associate with t.Fatal()
) the test result would have state: error
and some error message appended to errors
.
And if it's just regular errors like with t.Error()
the output of a test result would be state: failed
or state: partial_pass
and error messages appended to errors
.
@shawn-hurley Do you think that's clear enough to distinguish between a fatal/hard failure vs regular errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense for the fatal/hard failure.
I was also thinking that you would want to give the denominator. Like the number of tests run or possible points? I would also suggest that you note the total_score is a percentage. that is not super clear.
@brianwcook can you have your team take a look at this format to make sure that it works for your team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shawn-hurley Updated proposal with your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to also account for the possibility that the plugin executable exits with a non-zero return code and/or with output that is not what we expect. And describe how operator-sdk handles that situation.
|
||
In order to increase the flexibility of the user defined tests and allow users to implement more complex E2E style tests for scorecard, | ||
the user-defined tests will be implemented via a plugin system. Users will specify the path of the script they wish to run as well | ||
as environment variable to set for the command. The command would then print out the result as JSON to stdout. Here is an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this suggesting that you pass a yaml file into the command and it runs the things that way?
I would like to see maybe a directory that contains the runnable (binarys/scipts) that it just runs. This will 1. make creating the image version of this easier, 2. will allow us to create system-wide tests and project-specific tests like the yaml file but using convention rather than another yaml file.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I was thinking to have this work would just be adding a new section to the scorecard's config file that we can read with viper. That section, which I put the example of below, would specify where the scripts are and what environment variables should be set for configuration purposes. Can you give an example of what you would like to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on @shawn-hurley's comment.
I would propose that we have the scorecard command just run every executable in <projectRoot>/test/scorecard/bin
(or something similar).
And since we support shell scripts, I don't see the need for a separate declaration of environment variables, since those could be setup within the executable itself.
On a related note, are there environment variables that would be helpful to set for ALL scorecard tests?
Lastly, do we have opinions on how users should handle the plugin executables? Check them into git? Use a Makefile that can download or build them? How will we distribute our plugins (e.g. simple-scorecard
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new section to the config file, I could just add 1 new flag/config option pointing to the directory where the scorecard scripts are and treat each top level file ending in .sh
as a test. And I guess you're right about not needing the environment variables. One env var that may make sense to set for all scripts would be $KUBECONFIG
, to make sure all scripts run in the correct cluster.
When it comes to how users should handle the executables, it depends on the user. Probably most ideal would be the script downloading the executable, similar to how we download dep
during the CI. A user could also include it in their repo if they feel like that would be better (we do this for the marker
executable, since it is quite tiny but would take a long time to compile during CI). As long as the script can run on a generic linux system using bash, it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can you just run all the top level files that are executable?
I would also suggest that we let folks build these plugins and then decided on how we want to manage them. I think that will help us determine if the plugins folks are writing are generic or super specific if they are tied to a particular project or just to all operators. We can start to see patterns that might help us make better decisions here. I don't think we wait super long, but we should get feedback before making this decision IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shawn-hurley Yeah, I'll update this to run top level files that are executable instead of just scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall SGTM
Only nit being that we should support using JSON patch in our basic test plugin to modify the spec and check resource/status fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall sounds pretty good to me. A few more questions, comments, and suggestions.
// Modifications specifies a spec field to change in the CR with the expected results | ||
type Modification struct { | ||
// a map of the spec fields to modify | ||
Spec map[string]interface{} `mapstructure:"spec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about the current map-style modifications and expectations, the more I think we should support only JSON patch for both modifications and expected resources/status.
If we leave the map-style declarations, we would need to define what method we use to merge and compare lists, maps, strings, numbers, etc, and I think users would end up spending a decent chunk of time troubleshooting why certain things are merging or being compared in certain ways.
If we support only JSON patch, I think the semantics are much more obvious and easier to reason about, both from our perspective as maintainers and from the perspective of someone writing a test. And if we're using YAML, I'd suggest allowing the JSON patch to be defined in YAML.
Here's what's in my head. Feel free to throw darts.
userDefinedTests:
- name: Scaling MyApp
description: Tests that MyApp properly scales from 1 to 3
timeout: 1m
setup:
# In addition to `crPath`, maybe we could also allow `cr`
# and have the full CR embedded in the test? We could
# always add that later.
crPath: ./path/to/my/cr.yaml
# Would we an expected map here
# to wait for the initial CR to be setup?
expected:
cr:
tests:
- op: test
path: /status/conditions[0]/type
value: Ready
- op: test
path: /status/conditions[0]/status
value: true
resources:
- group: apps
version: v1
kind: Deployment
name: my-example-deployment
tests:
- op: test
path: /spec/replicas
value: 1
crModifications:
- op: replace
path: spec/count
value: 3
expected:
cr:
tests:
- op: test
path: /status/conditions[0]/type
value: Ready
- op: test
path: /status/conditions[0]/status
value: true
resources:
- group: apps
version: v1
kind: Deployment
name: my-example-deployment
tests:
- op: test
path: /spec/replicas
value: 3
|
||
In order to increase the flexibility of the user defined tests and allow users to implement more complex E2E style tests for scorecard, | ||
the user-defined tests will be implemented via a plugin system. Users will specify the path of the script they wish to run as well | ||
as environment variable to set for the command. The command would then print out the result as JSON to stdout. Here is an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on @shawn-hurley's comment.
I would propose that we have the scorecard command just run every executable in <projectRoot>/test/scorecard/bin
(or something similar).
And since we support shell scripts, I don't see the need for a separate declaration of environment variables, since those could be setup within the executable itself.
On a related note, are there environment variables that would be helpful to set for ALL scorecard tests?
Lastly, do we have opinions on how users should handle the plugin executables? Check them into git? Use a Makefile that can download or build them? How will we distribute our plugins (e.g. simple-scorecard
)?
|
||
Below is an example of what the JSON output of a test would look like: | ||
|
||
```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to also account for the possibility that the plugin executable exits with a non-zero return code and/or with output that is not what we expect. And describe how operator-sdk handles that situation.
to parse and integrate with the other tests. The above JSON design is based on the `TestResult` type with the addition | ||
of a `state` type, which can be `pass` (earned == max), `partial_pass` (0 < earned < max), `fail` (earned == 0), or `error` | ||
(fatal error; disregard output score). We also print the number of tests in each state as well as the total score to make | ||
it easier for users to quickly see what states the tests are in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any good examples of other projects with a similar plugin system, specifically that expect the plugin to return data in a specific format? I ask because I wonder if we should version the interface so that we can make changes to it in the future without breaking old plugins.
Maybe a totally off the wall idea, but would it make sense to re-use the kubernetes API for this? Similar to how the admission webhook request/response API works? (btw, not suggesting we change to an HTTP interface or anything, just marshal things as k8s-compliant API objects)
Maybe something like:
type ScorecardTest struct {
metav1.TypeMeta `json:",inline"`
// Spec describes the attributes for the test
Spec *ScorecardTestSpec `json:"spec"`
// Results describes the results of running the test.
// +optional
Results *ScorecardTestResults `json:"results,omitempty"`
}
And maybe even crazier, perhaps we could write an operator that could execute these tests in-cluster. Food for thought 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I like the idea because it allows for versioning, and this will most likely go through a couple of changes over time as we use this more and figure out what else we may need from the output. And reusing some of the other features of the API could be quite useful here.
Running the scorecard tests in-cluster shouldn't be too difficult either, assuming we have a lot of permissions and can actually create all the resources we need. So maybe that could be a runtime mode in the future (kind of like how we have test local
and test cluster
).
UPDATE: This PR will be splitting into 2 PRs as it has essentially become 2 separate proposals (a plugin system for user defined tests and an actual plugin for the new system) |
This section is now a separate PR
|
||
This JSON output would make it simple for others to create scorecard plugins while keeping it simple for the scorecard | ||
to parse and integrate with the other tests. Each plugin would be considered a separate suite, and the full result of the scorecard | ||
would be a list of `ScorecardResult`s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the plugin output be the full ScorecardTest
object so that the TypeMeta
fields are included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the plugins should output a full ScorecardTest JSON. The scorecard can then make an array of the results from each plugin for the ScorecardResults section of the main scorecard output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after addressing one last question.
dee945a
to
9f37330
Compare
Description of the change: Make proposal for user defined scorecard tests
Motivation for the change: Allow for more useful functional tests in the scorecard