Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdn: handle offset>0 fragments when validating service traffic #13162

Merged
merged 1 commit into from
Mar 8, 2017

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Mar 1, 2017

By default (set-frag normal) all fragments of a fragmented packet
have a port number of 0. This is quite unhelpful; to get the right
port number for all fragments requires un-fragmenting the packet
with OVS's contrack capability, but this interacts badly with
iptables' contrack capability.

Instead, use the 'nx-match' mode which keeps the port numbers in
the first fragment of a fragmented packet, and add rules to allow
subsequent fragments (with port=0) to pass through. Assume that
the destination IP stack will reject offset>0 fragments that
arrive without a corresponding offset=0 first fragment.

We can't just drop the port checking because services can be
manually created with a static service IP address, and perhaps
users rely on creating two distinct services with the same service
IP address, but differentiated via port numbers.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1419692

@openshift/networking @danwinship how awful is this workaround? Can we make the assumption that the destination IP stack will do the right thing? In any case, I've verified with packet counters that offset>0 fragments are hitting the port=0 rules, and my UDP echo tools work.

I suppose I could try to add a large UDP packet testcase to the extended networking test suite too.

otx.AddFlow(generateAddServiceRule(netID, service.Spec.ClusterIP, port.Protocol, int(port.Port)))
// Add another flow matching port 0 to allow subsequent fragments through
otx.AddFlow(generateAddServiceRule(netID, service.Spec.ClusterIP, port.Protocol, int(0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to have per-service-IP flows for this from a security or correctness point of view. (And in your case where the user intentionally creates two services with the same IP, this would end up deleting the port 0 rule for the IP after any of the services with that IP was deleted.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, maybe use "ip_frag=later" in the rule, for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ive done that locally but its not as simple as just adding ip_frag=later because:

 Error executing ovs-ofctl: OFPT_ERROR (OF1.3) (xid=0x2): OFPBMC_BAD_PREREQ

still need to check that out

@danwinship
Copy link
Contributor

how awful is this workaround? Can we make the assumption that the destination IP stack will do the right thing?

It seems non-awful to me. I don't think we're making any bad assumptions about the IP stack; we're already assuming that the IP stack will do the right thing with legitimate fragmented packets in the non-service-IP case (and that it will drop non-legitimate fragments), so we can assume it will do the right thing in the service IP case too.

It's technically legal to have packets with port 0 specified explicitly, so a flow matching that might theoretically match some real traffic, but kubernetes would never assign port 0 to a service anyway, so any packet with an IP in serviceNetworkCIDR and explicit port 0 is guaranteed to get dropped after leaving OVS regardless of what we do with it. (But adding ip_frag=later to the flow ensures we wouldn't end up matching such a packet anyway.)

@dcbw dcbw force-pushed the sdn-fragments branch 2 times, most recently from 57f42ec to 16199a8 Compare March 3, 2017 02:53
@dcbw
Copy link
Contributor Author

dcbw commented Mar 3, 2017

@openshift/networking @danwinship @knobunc updated to (a) use ip_frag=later and (b) better handle service rule deletion. But it turns out that we remove all service rules when the service changes anyway, so we don't really need to handle partial deletion.

Also, it turns out that Kube doesn't actually allow multiple services to use the same ClusterIP, so that makes things somewhat less complex.

@danwinship
Copy link
Contributor

updated to (a) use ip_frag=later and (b) better handle service rule deletion

What happened to the ovs-ofctl error you got with ip_frag before?

Unless there's some benefit to per-service rules that I'm not thinking of, I still feel like it would be better do a single static fragment-accepting flow covering all of serviceNetworkCIDR. I don't think there are security issues with allowing through fragments to fake service IPs, because the packet would just get dropped by iptables anyway. And having a single rule is simpler and means fewer ovs-ofctl calls and fewer flows for OVS to have to deal with.

@knobunc
Copy link
Contributor

knobunc commented Mar 3, 2017

I like @danwinship's idea of just allowing frags. I can't see a security risk and the clarity gained in the rules is a clear benefit.

@dcbw
Copy link
Contributor Author

dcbw commented Mar 3, 2017

Ok, was trying to be extra-careful, but I'll do the one rule to rule them all.

@dcbw
Copy link
Contributor Author

dcbw commented Mar 3, 2017

What happened to the ovs-ofctl error you got with ip_frag before?

@danwinship even though I couldn't find why in the code, you cannot give "tp_dst=0, ip_frag=later" because even though fragments will show up with dst/src ports equal to 0, ip_frag is apparently incompatible with anything transport-layer related.

But I think we still need to do per-service rules for fragments, because we need to set the VNID on them before we dump them into table 80. We can do one-rule-per-service though, I guess we don't care if it's udp or tcp.

@dcbw
Copy link
Contributor Author

dcbw commented Mar 3, 2017

@danwinship @knobunc PTAL, thanks!

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, yeah, VNID handling means we do need the per-service fragment rules. OK.

action := getServiceRuleAction(netID)

// Add blanket rule allowing subsequent IP fragments
if netID != osapi.GlobalVNID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VNID 0 is only special with the multitenant plugin (ie, not the networkpolicy plugin), so drop this check. (The rule it adds will end up being unused in multitenant, but that's consistent with other VNID 0 rules.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -468,24 +481,30 @@ func (plugin *OsdnNode) AddServiceRules(service *kapi.Service, netID uint32) {
func (plugin *OsdnNode) DeleteServiceRules(service *kapi.Service) {
glog.V(5).Infof("DeleteServiceRules for %v", service)

// Remove blanket rule allowing subsequent IP fragments; since we don't
// have the service VNID here we have to ignore errors since this will
// fail when removing services in the admin VNID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comment (and the extra EndTransaction) since it will no longer apply, but also FYI it wouldn't cause an error anyway; del-flows doesn't consider it an error if there are no matching flows (which has bitten us in the past).

Actually, we decided that all the flows associated with a service will have the same IP and no other services will use that IP, right? So we don't even need a separate fragment rule, or a loop on ServicePorts; just DelFlows("table=60, ip, nw_dst=IP") and that will get everything. (With maybe some further refactoring of the get*ServiceRule functions. Note that it's ok to redundantly specify both ip and tcp in a rule so you could use that table,ip,nw_dst rule as the base of the add rule too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Add blanket rule allowing subsequent IP fragments
if netID != osapi.GlobalVNID {
baseFragRule := generateBaseServiceFragRule(service.Spec.ClusterIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe only do this for UDP services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship It's a problem for TCP too, no? I haven't checked though, but any fragmented packet will hit this issue.

baseRule := generateBaseServiceRule(IP, protocol, port)
return fmt.Sprintf("%s, priority=100, actions=load:%d->NXM_NX_REG1[], load:2->NXM_NX_REG2[], goto_table:80", baseRule, netID)
func generateBaseServiceRule(IP string, protocol kapi.Protocol, port int) string {
return fmt.Sprintf("table=60, %s, nw_dst=%s, tp_dst=%d", strings.ToLower(string(protocol)), IP, port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, while we're changing things, can you fix this to use udp_dst/tcp_dst since tp_dst is apparently deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dcbw
Copy link
Contributor Author

dcbw commented Mar 6, 2017

@danwinship PTAL, thanks!

baseRule, err := generateBaseAddServiceRule(service.Spec.ClusterIP, port.Protocol, int(port.Port))
if err == nil {
otx.AddFlow(baseRule + action)
err = otx.EndTransaction()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, we should only be calling EndTransaction() once, at the very end. (The existing code was wrong.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that but deferred to the existing code.

By default (set-frag normal) all fragments of a fragmented packet
have a port number of 0.  This is quite unhelpful; to get the right
port number for all fragments requires un-fragmenting the packet
with OVS's contrack capability, but this interacts badly with
iptables' contrack capability.

Instead, use the 'nx-match' mode which keeps the port numbers in
the first fragment of a fragmented packet, and add rules to allow
subsequent fragments (with port=0) to pass through.  Assume that
the destination IP stack will reject offset>0 fragments that
arrive without a corresponding offset=0 first fragment.

We can't just drop the port checking because services can be
manually created with a static service IP address, and perhaps
users rely on creating two distinct services with the same service
IP address, but differentiated via port numbers.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1419692
@dcbw
Copy link
Contributor Author

dcbw commented Mar 7, 2017

@danwinship how about now...

@danwinship
Copy link
Contributor

[testextended][extended:networking]
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin testextended up to 4b89419

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 4b89419

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 4b89419

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/914/) (Base Commit: 0963c7e)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1172/) (Base Commit: 0963c7e) (Extended Tests: networking)

@openshift-bot
Copy link
Contributor

openshift-bot commented Mar 8, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/4/) (Base Commit: d759401) (Image: devenv-rhel7_6056)

@openshift-bot openshift-bot merged commit fc723f0 into openshift:master Mar 8, 2017
stevekuznetsov added a commit to stevekuznetsov/origin that referenced this pull request Mar 8, 2017
This reverts commit fc723f0, reversing
changes made to d759401.
stevekuznetsov added a commit to stevekuznetsov/origin that referenced this pull request Mar 8, 2017
This reverts commit fc723f0, reversing
changes made to d759401.

Signed-off-by: Steve Kuznetsov <[email protected]>
@smarterclayton
Copy link
Contributor

After this merged it looks like GCE deployments stopped responding to their public port (inside the master, you can hit, but outside you can't). We're testing the revert now. I have instances that can be debugged for someone to jump into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants