-
Notifications
You must be signed in to change notification settings - Fork 96
Pod IP selection rules for dual-stack environments #794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I hit an issue with this where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit concerned about the custom string parsing logic, that will be error prone (at least from my past experience) and we should try to make use of the according libraries in go/Python. I'm also not sure if PodIPPattern
is a good idea to allow any arbitrary regex? I would rather prefer to just have an IPv4
or IPv6
option.
.github/workflows/pull_request.yml
Outdated
@@ -63,7 +63,6 @@ jobs: | |||
- name: Run Python lint | |||
run: | | |||
pycodestyle --max-line-length=120 ./**/*.py | |||
black --check --diff ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to still have both, they target for different things and we just have to configure them properly. black
ensures a common formatting (e.g. like go fmt
a nd pycodestyle
also checks some other code issues.
|
||
if len(components) < 2 { | ||
ipEnd := strings.Index(address, "]:") + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of building this custom logic I would prefer to use net.SplitHostPort
that should make everything more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that work with the custom flags in our addresses like the :tls
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a solution for that in mind (more or less the same FDB currently does). I would work on that in #352 that should simplify thee logic here. For the PR itself that's fine and we can improve things step by step.
// This supports the values `pod` and `service`. | ||
PublicIPSource *PublicIPSource `json:"publicIPSource,omitempty"` | ||
|
||
// PodIPIndex tells the pod to use a specific entry from the podIPs list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the PodIPIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an artifact of an earlier form of the solution.
type RoutingConfig struct { | ||
// Headless determines whether we want to run a headless service for the | ||
// cluster. | ||
HeadlessService *bool `json:"headlessService,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking we never use the headless service for routing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I felt that "Routing" was closer than "Service" for this group of settings, since in the long term the plan is to use the headless service for discovery. Do you have any other suggestions for what we could call this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt like Routing
captured the intent of these fields better than Service
, but I'm open to other suggestions about how to make the meaning clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Discovery
? I mean that's also an overloaded term. Otherwise I don't have any better idea :( Like always:
"There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors." :)
log.info(f"Listening on {address}:{port}") | ||
server = ThreadingHTTPServer((address, int(port)), cls) | ||
|
||
if ":" in address: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ipaddress.ip_address(address).version == 6
instead of checking for :
the latter is error prone.
ip = matching_ips[0] | ||
if ":" in ip: | ||
ip = "[%s]" % ip | ||
return ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned IP address for IPv6 will be invalid since [::]
is not an valid IPv6 address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're enclosing the addresses in brackets because everywhere we are using it in the monitor conf, it gets paired with a port, so it needs the brackets. We could modify the monitor conf to contain the brackets, but I think that would lead to more challenges with the rollouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the current state but we might to revisit this. Sadly Python doesn't have https://golang.org/pkg/net/#JoinHostPort (at least to my knowledge) and I remember doing the same fun stuff: https://github.com/FoundationDB/fdb-kubernetes-operator/pull/341/files#diff-361ca47879a27b8387987496c32b845dbea647c753333511aebc6d93a19be6edR505-R521 I think I just moved it to a later state to prevent parsing the address for the web server again.
ips = string.split(",") | ||
matching_ips = [ip for ip in ips if pattern.search(ip)] | ||
if len(matching_ips) == 0: | ||
raise Exception("Failed to find IP matching %s in %s" % (pattern, ips)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise Exception("Failed to find IP matching %s in %s" % (pattern, ips)) | |
raise Exception(f"Failed to find IP matching {pattern} in {ips}") |
if len(matching_ips) == 0: | ||
raise Exception("Failed to find IP matching %s in %s" % (pattern, ips)) | ||
ip = matching_ips[0] | ||
if ":" in ip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ipaddress.ip_address(ip).version == 6
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few comments other than that LGTM for the first steps towards IPv6.
type RoutingConfig struct { | ||
// Headless determines whether we want to run a headless service for the | ||
// cluster. | ||
HeadlessService *bool `json:"headlessService,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Discovery
? I mean that's also an overloaded term. Otherwise I don't have any better idea :( Like always:
"There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors." :)
|
||
if len(components) < 2 { | ||
ipEnd := strings.Index(address, "]:") + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a solution for that in mind (more or less the same FDB currently does). I would work on that in #352 that should simplify thee logic here. For the PR itself that's fine and we can improve things step by step.
controllers/cluster_controller.go
Outdated
} | ||
|
||
return []string{instance.Pod.ObjectMeta.Annotations[fdbtypes.PublicIPAnnotation]} | ||
} | ||
|
||
func getPublicIpsForPod(pod *corev1.Pod) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still unchanged? Or is GitHub just trolling me?
"--public-ip-family", | ||
help=( | ||
"Tells the sidecar to treat the public IP as a comma-separated " | ||
"list, and use the first entry in the specified IP family" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example what the sidecar actually expects?
raise Exception(f"Failed to find IPv{version} entry in {ips}") | ||
ip = matching_ips[0] | ||
if version == 6: | ||
ip = "[%s]" % ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ip = "[%s]" % ip | |
ip = f"[{ip}]" |
ip = matching_ips[0] | ||
if ":" in ip: | ||
ip = "[%s]" % ip | ||
return ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the current state but we might to revisit this. Sadly Python doesn't have https://golang.org/pkg/net/#JoinHostPort (at least to my knowledge) and I remember doing the same fun stuff: https://github.com/FoundationDB/fdb-kubernetes-operator/pull/341/files#diff-361ca47879a27b8387987496c32b845dbea647c753333511aebc6d93a19be6edR505-R521 I think I just moved it to a later state to prevent parsing the address for the web server again.
@@ -193,3 +193,29 @@ func checkDeprecation(cmd *cobra.Command, kubeClient client.Client, inputCluster | |||
cmd.Printf("%d cluster(s) without deprecations\n", clusterCounter) | |||
return nil | |||
} | |||
|
|||
func getMinimalYAML(object interface{}) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could also be a FoundationDBClusterSpec
? I assume you used an interface{}
to use that method later for different structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my thinking.
…elp with testing.
Make the liveness probe for the sidecar enabled by default in the future, while retaining an option to disable it explicitly. Remove empty fields from the YAML output in the plugin deprecation command.
Remove temporary code for reading the IP pattern from an annotation.
Description
This issue provides an option for using specific addresses from the
podIPs
list in dual-stack environments. This should unblock testing with IPv6 in those environments, but there is more work to be done to fill out the IPv6 solution.This relates to #85, but we should talk more about how we want to track the feature work for IPv6.
This also includes an option to disable the readiness probe for the sidecar, as discussed in #668. I added this here because the current probes can only go to the main pod IP, as discussed in kubernetes/kubernetes#101324.
Type of change
Please select one of the options below.
Discussion
Are there any design details that you would like to discuss further?
I chose to use a pattern string instead of encoding knowledge about the address family. This gives more flexibility but will make it less obvious how to configure IPv6 through this path.
When changing the IP pattern, this does a rolling bounce. That bounce will change coordinator IPs, meaning that the operator has to repeatedly select new coordinators. I considered alternatives like migrating to new pods or configuring multiple listeners, but I think it's not worth the complexity for our first pass at this.
Testing
Please describe the tests that you ran to verify your changes. Unit tests?
Manual testing?
I created a cluster in a testing environment and switched between patterns to select the IPv4 address and the IPv6 address.
Documentation
Did you update relevant documentation within this repository?
No. I think we want to do more testing before documenting IPv6, but we should track it as part of the bigger project.
Follow-up
Are there any follow-up issues that we should pursue in the future?
We have more aspects of the IPv6 story that we should fill out: