-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdn: handle offset>0 fragments when validating service traffic #13162
Conversation
pkg/sdn/plugin/controller.go
Outdated
otx.AddFlow(generateAddServiceRule(netID, service.Spec.ClusterIP, port.Protocol, int(port.Port))) | ||
// Add another flow matching port 0 to allow subsequent fragments through | ||
otx.AddFlow(generateAddServiceRule(netID, service.Spec.ClusterIP, port.Protocol, int(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to have per-service-IP flows for this from a security or correctness point of view. (And in your case where the user intentionally creates two services with the same IP, this would end up deleting the port 0 rule for the IP after any of the services with that IP was deleted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, maybe use "ip_frag=later" in the rule, for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ive done that locally but its not as simple as just adding ip_frag=later because:
Error executing ovs-ofctl: OFPT_ERROR (OF1.3) (xid=0x2): OFPBMC_BAD_PREREQ
still need to check that out
It seems non-awful to me. I don't think we're making any bad assumptions about the IP stack; we're already assuming that the IP stack will do the right thing with legitimate fragmented packets in the non-service-IP case (and that it will drop non-legitimate fragments), so we can assume it will do the right thing in the service IP case too. It's technically legal to have packets with port 0 specified explicitly, so a flow matching that might theoretically match some real traffic, but kubernetes would never assign port 0 to a service anyway, so any packet with an IP in serviceNetworkCIDR and explicit port 0 is guaranteed to get dropped after leaving OVS regardless of what we do with it. (But adding |
57f42ec
to
16199a8
Compare
@openshift/networking @danwinship @knobunc updated to (a) use ip_frag=later and (b) better handle service rule deletion. But it turns out that we remove all service rules when the service changes anyway, so we don't really need to handle partial deletion. Also, it turns out that Kube doesn't actually allow multiple services to use the same ClusterIP, so that makes things somewhat less complex. |
What happened to the ovs-ofctl error you got with ip_frag before? Unless there's some benefit to per-service rules that I'm not thinking of, I still feel like it would be better do a single static fragment-accepting flow covering all of serviceNetworkCIDR. I don't think there are security issues with allowing through fragments to fake service IPs, because the packet would just get dropped by iptables anyway. And having a single rule is simpler and means fewer ovs-ofctl calls and fewer flows for OVS to have to deal with. |
I like @danwinship's idea of just allowing frags. I can't see a security risk and the clarity gained in the rules is a clear benefit. |
Ok, was trying to be extra-careful, but I'll do the one rule to rule them all. |
@danwinship even though I couldn't find why in the code, you cannot give "tp_dst=0, ip_frag=later" because even though fragments will show up with dst/src ports equal to 0, ip_frag is apparently incompatible with anything transport-layer related. But I think we still need to do per-service rules for fragments, because we need to set the VNID on them before we dump them into table 80. We can do one-rule-per-service though, I guess we don't care if it's udp or tcp. |
@danwinship @knobunc PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, yeah, VNID handling means we do need the per-service fragment rules. OK.
pkg/sdn/plugin/controller.go
Outdated
action := getServiceRuleAction(netID) | ||
|
||
// Add blanket rule allowing subsequent IP fragments | ||
if netID != osapi.GlobalVNID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VNID 0 is only special with the multitenant plugin (ie, not the networkpolicy plugin), so drop this check. (The rule it adds will end up being unused in multitenant, but that's consistent with other VNID 0 rules.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/sdn/plugin/controller.go
Outdated
@@ -468,24 +481,30 @@ func (plugin *OsdnNode) AddServiceRules(service *kapi.Service, netID uint32) { | |||
func (plugin *OsdnNode) DeleteServiceRules(service *kapi.Service) { | |||
glog.V(5).Infof("DeleteServiceRules for %v", service) | |||
|
|||
// Remove blanket rule allowing subsequent IP fragments; since we don't | |||
// have the service VNID here we have to ignore errors since this will | |||
// fail when removing services in the admin VNID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the comment (and the extra EndTransaction) since it will no longer apply, but also FYI it wouldn't cause an error anyway; del-flows doesn't consider it an error if there are no matching flows (which has bitten us in the past).
Actually, we decided that all the flows associated with a service will have the same IP and no other services will use that IP, right? So we don't even need a separate fragment rule, or a loop on ServicePorts; just DelFlows("table=60, ip, nw_dst=IP")
and that will get everything. (With maybe some further refactoring of the get*ServiceRule functions. Note that it's ok to redundantly specify both ip
and tcp
in a rule so you could use that table,ip,nw_dst
rule as the base of the add rule too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/sdn/plugin/controller.go
Outdated
|
||
// Add blanket rule allowing subsequent IP fragments | ||
if netID != osapi.GlobalVNID { | ||
baseFragRule := generateBaseServiceFragRule(service.Spec.ClusterIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maybe only do this for UDP services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship It's a problem for TCP too, no? I haven't checked though, but any fragmented packet will hit this issue.
pkg/sdn/plugin/controller.go
Outdated
baseRule := generateBaseServiceRule(IP, protocol, port) | ||
return fmt.Sprintf("%s, priority=100, actions=load:%d->NXM_NX_REG1[], load:2->NXM_NX_REG2[], goto_table:80", baseRule, netID) | ||
func generateBaseServiceRule(IP string, protocol kapi.Protocol, port int) string { | ||
return fmt.Sprintf("table=60, %s, nw_dst=%s, tp_dst=%d", strings.ToLower(string(protocol)), IP, port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, while we're changing things, can you fix this to use udp_dst
/tcp_dst
since tp_dst
is apparently deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@danwinship PTAL, thanks! |
pkg/sdn/plugin/controller.go
Outdated
baseRule, err := generateBaseAddServiceRule(service.Spec.ClusterIP, port.Protocol, int(port.Port)) | ||
if err == nil { | ||
otx.AddFlow(baseRule + action) | ||
err = otx.EndTransaction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, we should only be calling EndTransaction() once, at the very end. (The existing code was wrong.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that but deferred to the existing code.
By default (set-frag normal) all fragments of a fragmented packet have a port number of 0. This is quite unhelpful; to get the right port number for all fragments requires un-fragmenting the packet with OVS's contrack capability, but this interacts badly with iptables' contrack capability. Instead, use the 'nx-match' mode which keeps the port numbers in the first fragment of a fragmented packet, and add rules to allow subsequent fragments (with port=0) to pass through. Assume that the destination IP stack will reject offset>0 fragments that arrive without a corresponding offset=0 first fragment. We can't just drop the port checking because services can be manually created with a static service IP address, and perhaps users rely on creating two distinct services with the same service IP address, but differentiated via port numbers. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1419692
@danwinship how about now... |
[testextended][extended:networking] |
Evaluated for origin testextended up to 4b89419 |
Evaluated for origin merge up to 4b89419 |
[Test]ing while waiting on the merge queue |
Evaluated for origin test up to 4b89419 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/914/) (Base Commit: 0963c7e) |
continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1172/) (Base Commit: 0963c7e) (Extended Tests: networking) |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/4/) (Base Commit: d759401) (Image: devenv-rhel7_6056) |
This reverts commit fc723f0, reversing changes made to d759401. Signed-off-by: Steve Kuznetsov <[email protected]>
After this merged it looks like GCE deployments stopped responding to their public port (inside the master, you can hit, but outside you can't). We're testing the revert now. I have instances that can be debugged for someone to jump into. |
By default (set-frag normal) all fragments of a fragmented packet
have a port number of 0. This is quite unhelpful; to get the right
port number for all fragments requires un-fragmenting the packet
with OVS's contrack capability, but this interacts badly with
iptables' contrack capability.
Instead, use the 'nx-match' mode which keeps the port numbers in
the first fragment of a fragmented packet, and add rules to allow
subsequent fragments (with port=0) to pass through. Assume that
the destination IP stack will reject offset>0 fragments that
arrive without a corresponding offset=0 first fragment.
We can't just drop the port checking because services can be
manually created with a static service IP address, and perhaps
users rely on creating two distinct services with the same service
IP address, but differentiated via port numbers.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1419692
@openshift/networking @danwinship how awful is this workaround? Can we make the assumption that the destination IP stack will do the right thing? In any case, I've verified with packet counters that offset>0 fragments are hitting the port=0 rules, and my UDP echo tools work.
I suppose I could try to add a large UDP packet testcase to the extended networking test suite too.