Skip to content

Commit 088ca96

Browse files
committed
address comments
1 parent 0fb51ea commit 088ca96

File tree

1 file changed

+29
-5
lines changed
  • keps/sig-instrumentation/647-apiserver-tracing

1 file changed

+29
-5
lines changed

keps/sig-instrumentation/647-apiserver-tracing/README.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,24 @@ Along with metrics and logs, traces are a useful form of telemetry to aid with d
7171

7272
We will wrap the API Server's http server and http clients with [otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/master/instrumentation/net/http/otelhttp) to get spans for incoming and outgoing http requests. This generates spans for all sampled incoming requests and propagates context with all client requests. For incoming requests, this would go below [WithRequestInfo](https://github.com/kubernetes/kubernetes/blob/9eb097c4b07ea59c674a69e19c1519f0d10f2fa8/staging/src/k8s.io/apiserver/pkg/server/config.go#L676) in the filter stack, as it must be after authentication and authorization, before the panic filter, and is closest in function to the WithRequestInfo filter.
7373

74-
Note that some clients of the API Server, such as webhooks, may make reentrant calls to the API Server. To gain the full benefit of tracing, such clients should propagate context with requests back to the API Server.
74+
Note that some clients of the API Server, such as webhooks, may make reentrant calls to the API Server. To gain the full benefit of tracing, such clients should propagate context with requests back to the API Server. One way to do this is to use the wrap the webhook's http server using otelhttp, and use the request's context when making requests to the API Server.
75+
76+
**Webhook Example**
77+
78+
Wrapping the http server, which ensures context is propagated from http headers to the requests context:
79+
```golang
80+
mux := http.NewServeMux()
81+
handler := otelhttp.NewHandler(mux, "HandleAdmissionRequest")
82+
```
83+
Use the context from the request in reentrant requests:
84+
```golang
85+
ctx := req.Context()
86+
client.CoreV1().Pods("").List(ctx, metav1.ListOptions{})
87+
```
88+
89+
Note: Even though the admission controller uses the otelhttp handler wrapper, that does _not_ mean it will emit spans. OpenTelemetry has a concept of an SDK, which manages the exporting of telemetry. If no SDK is registered, the NoOp SDK is used, which only propagates context, and does not export spans. In the webhook case in which no SDK is registered, the reentrant API call would appear to be a direct child of the original API call. If the webhook registers an SDK and exports spans, there would be an additional span from the webhook between the original and reentrant API Server call.
90+
91+
Note: OpenTelemetry has a concept of ["Baggage"](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/baggage/api.md#baggage-api), which is akin to annotations for propagated context. If there is any additional metadata we would like to attach, and propagate along with a request, we can do that using Baggage.
7592

7693
### Exporting Spans
7794

@@ -81,7 +98,7 @@ The API Server will use the [OpenTelemetry exporter format](https://github.com/o
8198

8299
### Running the OpenTelemetry Collector
83100

84-
The [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) can be run as a sidecar, a daemonset, a deployment , or a combination in which the daemonset buffers telemetry and forwards to the deployment for aggregation (e.g. tail-base sampling) and routing to a telemetry backend. To support these various setups, the API Server should be able to send traffic either to a local (on the master) collector, or to a cluster service (in the cluster).
101+
The [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) can be run as a sidecar, a daemonset, a deployment , or a combination in which the daemonset buffers telemetry and forwards to the deployment for aggregation (e.g. tail-base sampling) and routing to a telemetry backend. To support these various setups, the API Server should be able to send traffic either to a local (on the control plane network) collector, or to a cluster service (on the cluster network).
85102

86103
### APIServer Configuration and EgressSelectors
87104

@@ -96,12 +113,12 @@ type OpenTelemetryClientConfiguration struct {
96113

97114
// +optional
98115
// URL of the collector that's running on the master.
99-
// if URL is specified, APIServer uses the egressType Master when sending tracing data to the collector.
116+
// if URL is specified, APIServer uses the egressType Master when sending data to the collector.
100117
URL *string `json:"url,omitempty" protobuf:"bytes,3,opt,name=url"`
101118

102119
// +optional
103120
// Service that's the frontend of the collector deployment running in the cluster.
104-
// If Service is specified, APIServer uses the egressType Cluster when sending tracing data to the collector.
121+
// If Service is specified, APIServer uses the egressType Cluster when sending data to the collector.
105122
Service *ServiceReference `json:"service,omitempty" protobuf:"bytes,1,opt,name=service"`
106123
}
107124

@@ -122,6 +139,8 @@ type ServiceReference struct {
122139
}
123140
```
124141

142+
If `--opentelemetry-config-file` is not specified, the API Server will not send any telemetry.
143+
125144
### Controlling use of the OpenTelemetry library
126145

127146
As the community found in the [Metrics Stability Framework KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/20190404-kubernetes-control-plane-metrics-stability.md#kubernetes-control-plane-metrics-stability), having control over how the client libraries are used in kubernetes can enable maintainers to enforce policy and make broad improvements to the quality of telemetry. To enable future improvements to tracing, we will restrict the direct use of the OpenTelemetry library within the kubernetes code base, and provide wrapped versions of functions we wish to expose in a utility library.
@@ -143,6 +162,11 @@ Beta
143162
- [] OpenTelemetry reaches GA
144163
- [] Publish examples of how to use the OT Collector with kubernetes
145164
- [] Allow time for feedback
165+
- [] Revisit the format used to export spans.
166+
167+
GA
168+
169+
- [] Tracing e2e tests are promoted to conformance tests
146170

147171
## Production Readiness Survey
148172

@@ -199,7 +223,7 @@ Beta
199223
- What are the known failure modes? **The API Server is misconfigured, and cannot talk to the collector. The collector is misconfigured, and can't send traces to the backend.**
200224
- How can those be detected via metrics or logs? Logs from the component or agent based on the failure mode.
201225
- What are the mitigations for each of those failure modes? **None. You must correctly configure the collector for tracing to work.**
202-
- What are the most useful log messages and what logging levels do they require? **All errors are useful, and are logged as errors (no logging levels required). Failure to initialize exporters (in both controller and collector), failures exporting metrics are the most useful.**
226+
- What are the most useful log messages and what logging levels do they require? **All errors are useful, and are logged as errors (no logging levels required). Failure to initialize exporters (in both controller and collector), failures exporting metrics are the most useful. Errors are logged for each failed attempt to establish a connection to the collector.**
203227
- What steps should be taken if SLOs are not being met to determine the
204228
problem? **Look at API Server and collector logs.**
205229

0 commit comments

Comments
 (0)