Skip to content

npep-285: Combine ANP and BANP #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npinaeva
Copy link
Member

@npinaeva npinaeva commented Mar 3, 2025

Current version doesn't suggest a specific API, but outlines different options. After this EP gets some feedback, it will be updated with the winning options.
tracking issue #285

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 3, 2025
@k8s-ci-robot k8s-ci-robot requested review from aojea and Dyanngg March 3, 2025 13:59
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 3, 2025
Copy link

netlify bot commented Mar 3, 2025

Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name Link
🔨 Latest commit 5a681de
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-network-policy-api/deploys/683ef834122d650008ed0d42
😎 Deploy Preview https://deploy-preview-289--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

@bowei bowei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly discussion comments, general direction is +1 from me (less APIs that are almost 100% duplicated the better)

1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field.
Make BANP non-singleton and apply `spec.priority` field to it.

2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should BANP Pass mean go to NP? But isn't BANP only active if there is no NP? It just sounds like Pass == Allow in behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass generally means "go the next layer", so for ANP that goes to netpol, and for BANP that goes to the default-cluster-policy, which is allow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pass" is currently documented as "skip any remaining ANP rules, and then pass execution to any NetworkPolicies that select the pod", and we generally describe its semantics as being to delegate from the admin to the end users.

But if we change "Pass" to mean "skip to the next tier", and then we add, say, a "HardTenancy" tier between the top tier and the NetworkPolicy tier, then Passing from the top tier would no longer "skip any remaining ANP rules and pass execution to NetworkPolicy". It would just skip from generic ANP rules to tenancy-specific ANP rules, which may or may not be what the admin wanted. (I'm sure we could come up with examples where it would do the right thing, and examples of where it would do the wrong thing.)

Also, I was thinking about the problem of protecting system namespaces from rogue ANPs that accidentally block access to critical components, and thought that perhaps would could add a "Provider" tier above the "Admin" tier, for distro/infra-provider rules (perhaps with a VAP to block new additions to that tier after cluster install time). But that wouldn't work, because the rule you want there is "for subject pods in system namespaces, Pass to the NetworkPolicy tier". But you wouldn't actually be able to Pass directly from the "Provider" tier to the "NetworkPolicy" tier, and "Pass"ing to the "Admin" tier is exactly what you don't want. (Of course, you could instead just have a preinstalled "tier: Admin" policy with priority 0 and a VAP to block other priority 0 policies, and then "Pass" would work. But this seems like the sort of problem that could apply to other potential future tiers.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slightly different understanding of the Pass action. I always imagined that having no ANP rules that match a connection should work exactly the same as Passing that connection to NP (aka just skip the current tier). In the current model ANP Pass can be interpreted as "delegate decision to the namespace owner" aka "I don't care what happens to this connection, you decide", which leads me to believe that this action can only be reasonably used as an exception (deny all, but pass all pod-to-pod)

If we add more tiers, I think allowing to skip the whole tier (e.g. with Pass always going to the NP level) makes this system more complicated, because then to make sure your tier rules are applied you need to check that a given connection is not skipped on the previous tier. I am also not sure I agree with the "rogue ANP" as a valid use case, because as a cluster admin I want to have control over my cluster connections, and letting provider override my rules makes this system less secure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the provider tier hypothetical, I think you'd just have to accept the important traffic in the provider tier rather than delegating? You lose a little fine grained control perhaps but "pass = go to next tier" is Calico's model and I don't think I've seen anyone find that limiting.

The original reasoning from the
[FQDN NPEP](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/npeps/npep-133-fqdn-egress-selector.md?plain=1#L33-L38)
is
> Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it sounds like the non-extensibility of NP API was not intended, so likely

  1. NPv2 needs to happen
  2. NPv2 will have FQDN (why not?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is certainly an option! in this context I am mostly trying to make sure ANP+BANP merge should not be a problem

DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282)
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`.
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means
the override of the BANP `Allow` DNS Name selector will happen automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this; couldn't you have a BANP that had multiple rules:

  • deny XYZ
  • allow to foo.example.com
  • deny to ABC
  • allow to bar.example.com

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you can. This part tries to address the original concern that a NetworkPolicy can't override allow to foo.example.com part of BANP, which it can, because an override will have to be "deny". So even if we ever implement allow DNSName peer in network policy, it has nothing to do with the ability to override BANP.


2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
or it could be seen as return from the current policy layer.
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if Pass is allowed, we could present it as there being a "final default-allow tier" that Pass jumps to. Then we're free to add more tiers after BANP in future and Pass will do the natural thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We should allow Pass in BANPs. Maybe we should consider renaming it? (Skip? Return? Or maybe Pass is still the best option...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass is my preferred option but mainly because it's consistent with Calico. We also considered calling that Skip and NextTier way-back-when. I think Skip is very similar to Pass in that you need to be told exactly what it does the first time? NextTier is explicit, which is nice.


As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor.
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code.
As an API developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the API and maintenance load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels artificial. I think it's more useful to just explain that lots of people have found the current API confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I do want to note that in the past 3 years we have advertised ANP/BANP so much that ppl now remember the tiering system so there will also be some confusion once we introduce this change for a little while - at least for people who know the current API - but new users will be fine

### Combining ANP and BANP

To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options:
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`.
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Override` and `Baseline`.

To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options:
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`.
- Change `spec.priority` field to have a pre-defined value for NetworkPolicy, use numbers below for ANP, and numbers above for BANP.
For example, if we set NetworkPolicy priority to 0, then negative priorities would mean ANP, and positive priorities would mean BANP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding this is the reason we ended up with separate BANP in the first place. People felt like it made the semantics of Pass too confusing and too hard to explain. "Pass skips over all remaining ANPs with priority greater than 0, then checks NPs, then comes back and processes the rest of the ANPs with priority less than 0."

I think the "tier" system makes this much clearer: Pass ends processing of the current tier and moves to the next tier.

#### Find a better name for `spec.priority`

While we are here, `priority` field name was brought to our attention multiple times as confusing. It should be mostly related to the fact
that "higher priority" means "lower number", which is counter-intuitive. Suggested alternatives are "order", "precedence", "sequence".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"precedence" is no better if we are keeping the "lower number means more important" semantics. "Order" and "sequence" work.

### Resolving differences between ANP and BANP

1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field.
Make BANP non-singleton and apply `spec.priority` field to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should point out that there is no "deep" reason for this; it's just because, at the time, we didn't have user stories that would require more than one BANP, so we decided to initially only allow a single BANP, figuring that we could always relax that and allow multiple BANPs later if we found good use cases for it, whereas if we started off allowing multiple BANPs and then it turned out we had been right before and there really were no good use cases for it, it would be harder to retroactively restrict it, because even if there weren't good use cases for it, people would have come up with bad use cases and wouldn't want us to change it.


2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
or it could be seen as return from the current policy layer.
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We should allow Pass in BANPs. Maybe we should consider renaming it? (Skip? Return? Or maybe Pass is still the best option...)

DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282)
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`.
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means
the override of the BANP `Allow` DNS Name selector will happen automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above what you quoted says "Currently only AdminNetworkPolicy is the intended scope for this proposal." (emphasis added). As I remember it, this was like the BANP singleton issue: we were not rejecting the idea of FQDNs in BANPs, we were just not allowing it now.

We could still have the rule that FQDNs are not allowed in tier: Baseline ANPs if we wanted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can, but I think it makes sense to allow it and have more uniform ANP vs BANP experience

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for uniformity

Story 2: Single CRD

As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor.
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the code really complex? The fields in BANP are a subset of what we have for ANP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintaining 2 copies of something is always harder and more error-prone. How much more complex - depends on the implementation, but there is no way to implement it with 0 overhead

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels less like a user story and more as something as API authors that we are trying to avoid increasing the number of APIs that need to be learned and maintained (complexity argument).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an API maintainer user story separately, because 2 CRDs add overhead both for API maintainers and for API implementation devs. While we can ignore our "own" API maintainers problems, making implementors' lives easier sounds like a valid user story to me

2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
or it could be seen as return from the current policy layer.
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP.
Unless we find strong reasons to forbid `Pass` for BANP, it makes sense to allow it to make the API more uniform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add the use case for wanting pass in BANP so that users know when that needs to be used?

Copy link
Contributor

@danwinship danwinship Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not particularly a use case for using Pass in tier: Baseline, given that it ends up meaning the same thing as Accept. It's more that it has a well-defined meaning ("skip the rest of the tier") and thus there's no good argument for restricting it.

EDIT: or maybe there's an argument for restricting Pass to mean only "pass from ANP to NetworkPolicy, regardless of what other tiers exist now or in the future"? #289 (comment)

Copy link
Contributor

@tssurya tssurya Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I typed this comment during our discussion as to not forget to write it down on npep.. like what you said.. its being added not because of a particular usecase but because its harmless

@npinaeva npinaeva force-pushed the npep-285 branch 2 times, most recently from 5578334 to ae059aa Compare April 23, 2025 11:26
2. Changing name is a challenge

We have advertised ANP/BANP a lot during previous kubecons, and we also have some implementations with real customers
that are using it. So changing the name of the CRD may bring more confusion about it being a totally new API, and potentiall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: potentiall

1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field.
Make BANP non-singleton and apply `spec.priority` field to it.

2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the provider tier hypothetical, I think you'd just have to accept the important traffic in the provider tier rather than delegating? You lose a little fine grained control perhaps but "pass = go to next tier" is Calico's model and I don't think I've seen anyone find that limiting.


2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy,
or it could be seen as return from the current policy layer.
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass is my preferred option but mainly because it's consistent with Calico. We also considered calling that Skip and NextTier way-back-when. I think Skip is very similar to Pass in that you need to be told exactly what it does the first time? NextTier is explicit, which is nice.

DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282)
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`.
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means
the override of the BANP `Allow` DNS Name selector will happen automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for uniformity

Copy link

@bowei bowei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is that we get the big change in (remove BNP etc) and there will be multiple passes to fix up the comments etc. From the NPEP perspective, I don't see a huge amount that we can gain by trying to the refining here in this proposal npep-285

Story 2: Single CRD

As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor.
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels less like a user story and more as something as API authors that we are trying to avoid increasing the number of APIs that need to be learned and maintained (complexity argument).


### Combining ANP and BANP

To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...specify the order of application based on some attribute other than the resource type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean something like integer tier values?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bowei, npinaeva
Once this PR has been reviewed and has the lgtm label, please assign astoycos for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bowei
Copy link

bowei commented May 6, 2025

/lgtm

As per my comment elsewhere -- I'm ok with moving ahead with the large changes and doing refining in the actual API changes themselves vs trying to do it in this proposal.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

Copy link

linux-foundation-easycla bot commented Jun 3, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: npinaeva / name: Nadia Pinaeva (5a681de)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 3, 2025
Signed-off-by: Nadia Pinaeva <[email protected]>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants