-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Network latency SLI #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network latency SLI #2636
Conversation
|
||
| Status | SLI | SLO | | ||
| --- | --- | --- | | ||
| __WIP__ | In-cluster network latency from a single node, measured as latency of per second ping to kubernetes.default.svc.cluster.local/, measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all nodes) per cluster-day <= X | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the master for this SLI? It is "special" in some sense, as in some implementations, the master lives on a different networking plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, ping from where. And you probably mean TCP RTT, not ICMP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the "different networking plane" argument is very good one. We should probably use some service that lives on nodes.
Do we have any service that we can use for that (in vanilla setup)?
About the ping - it's described above (maybe it should be here too). It's from nodes [described in the last section]
And yes - I think it's TCP RTT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, we should probably use a non-TLS-based endpoint to eliminate that latency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to say "RTT to a null service" where null service is defined to be some hello-world serving container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should probably use a non-TLS-based endpoint to eliminate that latency
Yes - that absolutely makes sense.
Does it make sense to say "RTT to a null service" where null service is defined to be some hello-world serving container?
Yes - that would be the ideal goal. Though, what I wanted to avoid is to create a dedicated service for that (or at least, creating a dedicated pods to serve those service), because we would like to measure that in user clusters too, and we don't want to consume additional user resources.
Maybe the workaround for it would be to make this "null service" be exposed from kube-proxy pods and then we would just need to create a service (which is not that big deal).
The drawback of that is that then there is a backend on every single node, so this won't work good in large clusters.
[Ideally, i would say let's expose it from a subset of kube-proxies, but then we would need to add labels on some kube-proxies, which then turns out to be too complicated, i think...]
Maybe you have something better on your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above we don't seem to have anything satisfying our needs now.
Making a service on top of kube-proxies will be problematic in large clusters, and choosing only a subset complicates cluster setup.
So we decided for requiring that null-service to be setup by cluster administrator.
|
||
### Other notes | ||
- We obviously can't give any guarantee in a general case, because cluster | ||
administrators may configure cluster as the want (e.g. number of DNS replicas). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the => they
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c8fefce
to
522e84e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bowei - PTAL
|
||
### Other notes | ||
- We obviously can't give any guarantee in a general case, because cluster | ||
administrators may configure cluster as the want (e.g. number of DNS replicas). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
| Status | SLI | SLO | | ||
| --- | --- | --- | | ||
| __WIP__ | In-cluster network latency from a single node, measured as latency of per second ping to kubernetes.default.svc.cluster.local/, measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all nodes) per cluster-day <= X | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above we don't seem to have anything satisfying our needs now.
Making a service on top of kube-proxies will be problematic in large clusters, and choosing only a subset complicates cluster setup.
So we decided for requiring that null-service to be setup by cluster administrator.
522e84e
to
02ee22e
Compare
cc: @lzang |
easy to measure. | ||
- The prober reporting that is fairly trivial and itself needs only negligible | ||
amount of resources. However, to avoid any visible overhead in the cluster | ||
(in terms of additionally needed components) **we will make it part of kube-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to test (as you stated in the SLI column) traffic from nodes. I don't think that is the most representative or useful metric. It seems more correct to test from a pod, through a service, to a pod, across nodes in the same primary topology.
I don't know how to generically express "same primary topology". In GCP that is zone, but topology is arbitrary -- maybe that needs to be parameterized?
We might ALSO want to SLI pod-to-pod without a service (or with a headless service) with the same conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to test (as you stated in the SLI column) traffic from nodes. I don't think that is the most representative or useful metric. It seems more correct to test from a pod, through a service, to a pod, across nodes in the same primary topology.
Kube-proxy is also a pod. So why this isn't what you want?
Re sam primary topology - given that we currently don't really support that fully, I'm a bit sceptical about doing that. Once we have topology-aware service routing (#2846) we can update that SLI.
@thockin - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed that offline with @thockin
The main points were:
- we can't use kube-proxy, because it's in host network - so I switched that to use dedicated prober
- about topologies - making that explicit that we acknowledge very different latencies if nodes are in different topologies - I added explicit point about that.
02ee22e
to
70ffdf1
Compare
/lgtm |
Are we going to measure this SLI in users' clusters in prod? From the discussion above, it seems that we would need user to setup this test as it consumes their resource. Do we know if users are willing to do that? Also would the result vary significantly due to user cluster's load level/deployment scenario? How do we account for that when we get the data? For example, the result could be different if user is running istio side-car or enable network policy. |
It depends on the provider whether they are willing to do that, whether they have permissions from customers etc. Yes - it will vary, but that's the purpose of different SLIs - to know where we are :-) |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @bowei @thockin @dcbw @caseydavenport
@kubernetes/sig-network-pr-reviews @kubernetes/sig-scalability-pr-reviews