Skip to content

Making IPIP/tunnel and override-nexthop independent #1025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/bgp.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,15 @@ kubectl annotate node ip-172-20-46-87.us-west-2.compute.internal "kube-router.io

By default kube-router populates GoBGP RIB with node IP as next hop for the advertised pod CIDR's and service VIP. While this works for most cases, overriding the next hop for the advertised rotues is necessary when node has multiple interfaces over which external peers are reached. Next hop need to be as per the interface local IP over which external peer can be reached. `--override-nexthop` let you override the next hop for the advertised route. Setting `--override-nexthop` to true leverages BGP next-hop-self functionality implemented in GoBGP. Next hop will automatically selected appropriately when advertising routes irrespective of the next hop in the RIB.

## Overriding the next hop and enable IPIP/tuennel

In a scenario there are multiple groups of nodes in different subnets and user wants to peer mulitple upstream routers for each of their node (.e.g., a cluster has seperate public and private networking, and the nodes in this cluster are located into two different racks, which has their own routers. So there would be two upstream routers for each node, and there are two different subnets in this case), The override-nexthop and tunnel cross subnet features need to be used together to achive the goal.

to support the above case, user need to set `--enable-overlay` and `--override-nexthop` to true together. This configuration would have the following effect.
Copy link
Collaborator

@aauren aauren Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In a scenario there are multiple groups of nodes in different subnets and user wants to peer mulitple upstream routers for each of their node (.e.g., a cluster has seperate public and private networking, and the nodes in this cluster are located into two different racks, which has their own routers. So there would be two upstream routers for each node, and there are two different subnets in this case), The override-nexthop and tunnel cross subnet features need to be used together to achive the goal.
to support the above case, user need to set `--enable-overlay` and `--override-nexthop` to true together. This configuration would have the following effect.
A common scenario exists where each node in the cluster is connected to two upstream routers that are in two different subnets. For example, one router is connected to a public network subnet and the other router is connected to a private network subnet. Additionally, nodes may be split across different subnets (e.g. different racks) each of which has their own routers.
In this scenario, `--override-nexthop` can be used to correctly peer with each upstream router, ensuring that the BGP next-hop attribute is correctly set to the node's IP address that faces the upstream router. The `--enable-overlay` option can be set to allow overlay/underlay tunneling across the different subnets to achieve an interconnected pod network.
This configuration would have the following effects:

I took a second pass at the wording to hopefully make it a little more concise. However, before changing anything I would recommend getting feedback from @murali-reddy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aauren Thank you very much for the feedback. I have integrated your comments in new commit.

@murali-reddy Would you please take a look and let me know if there is anything else I need to add?

* Peering Outside the Cluster (https://github.com/cloudnativelabs/kube-router/blob/master/docs/bgp.md#peering-outside-the-cluster) via one of the many means that kube-router makes that option available
* Overriding Next Hop
* Enabling overlays in either full mode or with nodes in different subnets

The warning here is that when using `--override-nexthop` in the above scenario, it may cause kube-router to advertise an IP address other than the node IP which is what kube-router connects the tunnel to when the `--enable-overlay` option is given. If this happens it may cause some network flows to become un-routable.

Specifically, people need to take care when combining `--override-nexthop` and `--enable-overlay` and make sure that they understand their network, the flows they desire, how the kube-router logic works, and the possible side-effects that are created from their configuration. Please refer to this PR for the risk and impact discussion https://github.com/cloudnativelabs/kube-router/pull/1025.
3 changes: 1 addition & 2 deletions pkg/controllers/routing/network_routes_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -560,9 +560,8 @@ out:
}

// create IPIP tunnels only when node is not in same subnet or overlay-type is set to 'full'
// prevent creation when --override-nextHop=true as well
// if the user has disabled overlays, don't create tunnels
if (!sameSubnet || nrc.overlayType == "full") && !nrc.overrideNextHop && nrc.enableOverlays {
if (!sameSubnet || nrc.overlayType == "full") && nrc.enableOverlays {
// create ip-in-ip tunnel and inject route as overlay is enabled
var link netlink.Link
var err error
Expand Down