-
Notifications
You must be signed in to change notification settings - Fork 35
npep-285: Combine ANP and BANP #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly discussion comments, general direction is +1 from me (less APIs that are almost 100% duplicated the better)
1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field. | ||
Make BANP non-singleton and apply `spec.priority` field to it. | ||
|
||
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should BANP Pass mean go to NP? But isn't BANP only active if there is no NP? It just sounds like Pass == Allow in behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass generally means "go the next layer", so for ANP that goes to netpol, and for BANP that goes to the default-cluster-policy, which is allow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Pass" is currently documented as "skip any remaining ANP rules, and then pass execution to any NetworkPolicies that select the pod
", and we generally describe its semantics as being to delegate from the admin to the end users.
But if we change "Pass" to mean "skip to the next tier", and then we add, say, a "HardTenancy" tier between the top tier and the NetworkPolicy tier, then Passing from the top tier would no longer "skip any remaining ANP rules and pass execution to NetworkPolicy". It would just skip from generic ANP rules to tenancy-specific ANP rules, which may or may not be what the admin wanted. (I'm sure we could come up with examples where it would do the right thing, and examples of where it would do the wrong thing.)
Also, I was thinking about the problem of protecting system namespaces from rogue ANPs that accidentally block access to critical components, and thought that perhaps would could add a "Provider" tier above the "Admin" tier, for distro/infra-provider rules (perhaps with a VAP to block new additions to that tier after cluster install time). But that wouldn't work, because the rule you want there is "for subject pods in system namespaces, Pass to the NetworkPolicy tier". But you wouldn't actually be able to Pass directly from the "Provider" tier to the "NetworkPolicy" tier, and "Pass"ing to the "Admin" tier is exactly what you don't want. (Of course, you could instead just have a preinstalled "tier: Admin" policy with priority 0 and a VAP to block other priority 0 policies, and then "Pass" would work. But this seems like the sort of problem that could apply to other potential future tiers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slightly different understanding of the Pass action. I always imagined that having no ANP rules that match a connection should work exactly the same as Pass
ing that connection to NP (aka just skip the current tier). In the current model ANP Pass
can be interpreted as "delegate decision to the namespace owner" aka "I don't care what happens to this connection, you decide", which leads me to believe that this action can only be reasonably used as an exception (deny all, but pass all pod-to-pod)
If we add more tiers, I think allowing to skip the whole tier (e.g. with Pass
always going to the NP level) makes this system more complicated, because then to make sure your tier rules are applied you need to check that a given connection is not skipped on the previous tier. I am also not sure I agree with the "rogue ANP" as a valid use case, because as a cluster admin I want to have control over my cluster connections, and letting provider override my rules makes this system less secure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the provider tier hypothetical, I think you'd just have to accept the important traffic in the provider tier rather than delegating? You lose a little fine grained control perhaps but "pass = go to next tier" is Calico's model and I don't think I've seen anyone find that limiting.
The original reasoning from the | ||
[FQDN NPEP](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/npeps/npep-133-fqdn-egress-selector.md?plain=1#L33-L38) | ||
is | ||
> Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it sounds like the non-extensibility of NP API was not intended, so likely
- NPv2 needs to happen
- NPv2 will have FQDN (why not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is certainly an option! in this context I am mostly trying to make sure ANP+BANP merge should not be a problem
DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282) | ||
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`. | ||
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means | ||
the override of the BANP `Allow` DNS Name selector will happen automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this; couldn't you have a BANP that had multiple rules:
- deny XYZ
- allow to foo.example.com
- deny to ABC
- allow to bar.example.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, you can. This part tries to address the original concern that a NetworkPolicy can't override allow to foo.example.com
part of BANP, which it can, because an override will have to be "deny". So even if we ever implement allow DNSName
peer in network policy, it has nothing to do with the ability to override BANP.
|
||
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, | ||
or it could be seen as return from the current policy layer. | ||
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if Pass is allowed, we could present it as there being a "final default-allow tier" that Pass jumps to. Then we're free to add more tiers after BANP in future and Pass will do the natural thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We should allow Pass
in BANPs. Maybe we should consider renaming it? (Skip
? Return
? Or maybe Pass
is still the best option...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass
is my preferred option but mainly because it's consistent with Calico. We also considered calling that Skip
and NextTier
way-back-when. I think Skip
is very similar to Pass
in that you need to be told exactly what it does the first time? NextTier
is explicit, which is nice.
npeps/npep-285-combine-crds.md
Outdated
|
||
As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor. | ||
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code. | ||
As an API developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the API and maintenance load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels artificial. I think it's more useful to just explain that lots of people have found the current API confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I do want to note that in the past 3 years we have advertised ANP/BANP so much that ppl now remember the tiering system so there will also be some confusion once we introduce this change for a little while - at least for people who know the current API - but new users will be fine
npeps/npep-285-combine-crds.md
Outdated
### Combining ANP and BANP | ||
|
||
To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options: | ||
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`. | |
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Override` and `Baseline`. |
To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options: | ||
- Add a new field `spec.tier` to distinguish ANP vs BANP. The values could be `Overrride` and `Baseline`. | ||
- Change `spec.priority` field to have a pre-defined value for NetworkPolicy, use numbers below for ANP, and numbers above for BANP. | ||
For example, if we set NetworkPolicy priority to 0, then negative priorities would mean ANP, and positive priorities would mean BANP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding this is the reason we ended up with separate BANP in the first place. People felt like it made the semantics of Pass
too confusing and too hard to explain. "Pass
skips over all remaining ANPs with priority
greater than 0, then checks NPs, then comes back and processes the rest of the ANPs with priority
less than 0."
I think the "tier" system makes this much clearer: Pass
ends processing of the current tier and moves to the next tier.
npeps/npep-285-combine-crds.md
Outdated
#### Find a better name for `spec.priority` | ||
|
||
While we are here, `priority` field name was brought to our attention multiple times as confusing. It should be mostly related to the fact | ||
that "higher priority" means "lower number", which is counter-intuitive. Suggested alternatives are "order", "precedence", "sequence". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"precedence" is no better if we are keeping the "lower number means more important" semantics. "Order" and "sequence" work.
npeps/npep-285-combine-crds.md
Outdated
### Resolving differences between ANP and BANP | ||
|
||
1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field. | ||
Make BANP non-singleton and apply `spec.priority` field to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should point out that there is no "deep" reason for this; it's just because, at the time, we didn't have user stories that would require more than one BANP, so we decided to initially only allow a single BANP, figuring that we could always relax that and allow multiple BANPs later if we found good use cases for it, whereas if we started off allowing multiple BANPs and then it turned out we had been right before and there really were no good use cases for it, it would be harder to retroactively restrict it, because even if there weren't good use cases for it, people would have come up with bad use cases and wouldn't want us to change it.
|
||
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, | ||
or it could be seen as return from the current policy layer. | ||
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We should allow Pass
in BANPs. Maybe we should consider renaming it? (Skip
? Return
? Or maybe Pass
is still the best option...)
DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282) | ||
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`. | ||
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means | ||
the override of the BANP `Allow` DNS Name selector will happen automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line above what you quoted says "Currently only AdminNetworkPolicy is the intended scope for this proposal." (emphasis added). As I remember it, this was like the BANP singleton issue: we were not rejecting the idea of FQDNs in BANPs, we were just not allowing it now.
We could still have the rule that FQDNs are not allowed in tier: Baseline
ANPs if we wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can, but I think it makes sense to allow it and have more uniform ANP vs BANP experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for uniformity
Story 2: Single CRD | ||
|
||
As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor. | ||
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the code really complex? The fields in BANP are a subset of what we have for ANP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maintaining 2 copies of something is always harder and more error-prone. How much more complex - depends on the implementation, but there is no way to implement it with 0 overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels less like a user story and more as something as API authors that we are trying to avoid increasing the number of APIs that need to be learned and maintained (complexity argument).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an API maintainer user story separately, because 2 CRDs add overhead both for API maintainers and for API implementation devs. While we can ignore our "own" API maintainers problems, making implementors' lives easier sounds like a valid user story to me
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, | ||
or it could be seen as return from the current policy layer. | ||
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP. | ||
Unless we find strong reasons to forbid `Pass` for BANP, it makes sense to allow it to make the API more uniform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the use case for wanting pass in BANP so that users know when that needs to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not particularly a use case for using Pass in tier: Baseline
, given that it ends up meaning the same thing as Accept. It's more that it has a well-defined meaning ("skip the rest of the tier") and thus there's no good argument for restricting it.
EDIT: or maybe there's an argument for restricting Pass to mean only "pass from ANP to NetworkPolicy, regardless of what other tiers exist now or in the future"? #289 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I typed this comment during our discussion as to not forget to write it down on npep.. like what you said.. its being added not because of a particular usecase but because its harmless
5578334
to
ae059aa
Compare
npeps/npep-285-combine-crds.md
Outdated
2. Changing name is a challenge | ||
|
||
We have advertised ANP/BANP a lot during previous kubecons, and we also have some implementations with real customers | ||
that are using it. So changing the name of the CRD may bring more confusion about it being a totally new API, and potentiall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: potentiall
1. BANP is a singleton, which is also the reason why it doesn't have `spec.priority` field. | ||
Make BANP non-singleton and apply `spec.priority` field to it. | ||
|
||
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the provider tier hypothetical, I think you'd just have to accept the important traffic in the provider tier rather than delegating? You lose a little fine grained control perhaps but "pass = go to next tier" is Calico's model and I don't think I've seen anyone find that limiting.
|
||
2. ANP supports `Pass` action, while BANP doesn't. `Pass` action could be considered a delegation to the NetworkPolicy, | ||
or it could be seen as return from the current policy layer. | ||
We have 2 options: allow using `Pass` for BANP, meaning that the rest od BANPs will be skipped or forbid this action for BANP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass
is my preferred option but mainly because it's consistent with Calico. We also considered calling that Skip
and NextTier
way-back-when. I think Skip
is very similar to Pass
in that you need to be told exactly what it does the first time? NextTier
is explicit, which is nice.
DNS Name selector is only [supported](https://github.com/kubernetes-sigs/network-policy-api/blob/feae59d056cf87131338a8449d31b85dc1d9790f/apis/v1alpha1/adminnetworkpolicy_types.go#L282) | ||
with the `Allow` rule, which means the desired override by the NetworkPolicy could only be `Deny`. | ||
Due to the default-deny nature of NetworkPolicy, all required traffic in a namespace should be explicitly allowed, which means | ||
the override of the BANP `Allow` DNS Name selector will happen automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for uniformity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is that we get the big change in (remove BNP etc) and there will be multiple passes to fix up the comments etc. From the NPEP perspective, I don't see a huge amount that we can gain by trying to the refining here in this proposal npep-285
Story 2: Single CRD | ||
|
||
As a cluster admin, I prefer to have 1 CRD for both ANP and BANP, so that it is easier to manage, understand, and monitor. | ||
As an API implementation developer, I prefer to have 1 CRD for both ANP and BANP, so that I can reduce the complexity of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels less like a user story and more as something as API authors that we are trying to avoid increasing the number of APIs that need to be learned and maintained (complexity argument).
|
||
### Combining ANP and BANP | ||
|
||
To combine ANP and BANP, we need to have a way to specify policy layer. Currently, we have the following options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...specify the order of application based on some attribute other than the resource type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean something like integer tier
values?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bowei, npinaeva The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm As per my comment elsewhere -- I'm ok with moving ahead with the large changes and doing refining in the actual API changes themselves vs trying to do it in this proposal. |
New changes are detected. LGTM label has been removed. |
|
Signed-off-by: Nadia Pinaeva <[email protected]>
Current version doesn't suggest a specific API, but outlines different options. After this EP gets some feedback, it will be updated with the winning options.
tracking issue #285