Skip to content

Add metrics for CEL for admission control KEP #112994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DangerOnTheRanger
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds metrics, as a part of KEP-3488. An additional PR will be needed to integrate the main KEP implementation with metrics; this PR only introduces the metrics themselves.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3488-cel-admission-control

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2022
Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to see the buckets.

@leilajal
Copy link
Contributor

/triage accepted
thanks @DangerOnTheRanger!

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 13, 2022
@DangerOnTheRanger DangerOnTheRanger changed the title [WIP] Add metrics for CEL for admission control KEP Add metrics for CEL for admission control KEP Oct 15, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Oct 20, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 20, 2022
Buckets: []float64{0.001, 0.01, 0.1, 1.0},
StabilityLevel: metrics.ALPHA,
},
[]string{"policy", "policy_binding", "validation_expression", "enforcement_action", "params", "state"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the cardinality on "params"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be equal to the number of parameter resouces - my understanding is that number should be no higher than the cardinality of policy_binding (@jpbetz - pinging you here in case I'm off on this one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, less than or equal to binding cardinality

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of these labels seem like they would have concerningly high cardinality. "policy", "policy_binding", "params" are all names of instances of objects? so the cardinality is only limited to the number of instances of those types as someone creates in a cluster? what is validation_expression, the string expression, name of the validation rule that failed, index of the rule that failed, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, those are all names (validation_expression is the name of the expression being checked). Do you think some of the labels should maybe be removed? I feel like the binding (and maybe the params?) label(s) could be removed without too much impact towards debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that metric labels tied to object names were undesirable since they have ~unbounded cardinality.

If we were going to have a name as a label, the binding name identifies the policy/params tuple (at least until singleton policies that are effective without a binding get implemented)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the label cardinality is truly capped to 100s per cluster, then even if the total cardinality is unbounded, it should not pose a terrible problem. We have far worse issues with the resource label for the apiserver request metrics. It basically just means we're not going to be able to do meaningful aggregation across multiple clusters over these dimensions. If the cardinality for a single cluster is actually unbounded, then I would highly recommend against these labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider this equivalent to the resource label in API server request metrics... every new CRD spawns a new resource value, and every new policy or policy_binding instance (depending on which of these labels was selected) would spawn a new label value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine then, moderately discouraged but if it would help debugging then I'd be okay with it.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 20, 2022
@DangerOnTheRanger DangerOnTheRanger force-pushed the validation-admission-metrics branch from 42c28db to 0a65adf Compare October 21, 2022 23:09
@logicalhan
Copy link
Member

/lgtm

(From sig instrumentation)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2022
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2022
@DangerOnTheRanger
Copy link
Contributor Author

/retest

@DangerOnTheRanger DangerOnTheRanger force-pushed the validation-admission-metrics branch from 4237698 to ac324cb Compare October 26, 2022 20:55
Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2022
@cici37
Copy link
Contributor

cici37 commented Oct 27, 2022

/assign @lavalamp

Would you mind to take a look when you have time? Thank you :)

Help: "Validation admission policy check total, labeled by policy and param resource, and further identified by binding, validation expression, enforcement action taken, and state.",
StabilityLevel: metrics.ALPHA,
},
[]string{"policy", "policy_binding", "validation_expression", "enforcement_action", "params", "state"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is incremented on every admission, even ones that are allowed. When we get to policy evaluations that can fail but still permit the request (e.g. audit or warn or fail open), how will that be reflected in this metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's incremented on every admission - for audit and warn support, I think we should be able to extend the enforcement_action label with those values.

@liggitt
Copy link
Member

liggitt commented Oct 27, 2022

sorry to keep asking questions, just trying to understand how these fit with the shape we expect validation admission evaluations to take

@liggitt
Copy link
Member

liggitt commented Oct 28, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DangerOnTheRanger, jpbetz, liggitt, logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2022
@cici37 cici37 mentioned this pull request Oct 28, 2022
9 tasks
@DangerOnTheRanger
Copy link
Contributor Author

/retest

1 similar comment
@DangerOnTheRanger
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants