Skip to content

Commit 03b2bd1

Browse files
Add logging guidelines
1 parent d761e1b commit 03b2bd1

File tree

3 files changed

+182
-1
lines changed

3 files changed

+182
-1
lines changed

REVIEWING.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ feedback and suggestions.
2020
* [Owner References](#owner-references)
2121
* [The Cluster API contract](#the-cluster-api-contract)
2222
* [Logging](#logging)
23+
* [Testing](#testing)
2324

2425
## Resources
2526

@@ -119,7 +120,10 @@ This makes any change that can impact the Cluster API contract critical and usua
119120

120121
### Logging
121122

122-
- For CAPI controllers see [Kubernetes logging conventions](https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md).
123+
While developing controllers in Cluster API a key requirement is to add logging to observe the system and
124+
to help troubleshooting issues.
125+
126+
- For CAPI controllers see TODO.
123127
- For clusterctl see [clusterctl logging conventions](https://github.com/kubernetes-sigs/cluster-api/blob/main/cmd/clusterctl/log/doc.go).
124128

125129
### Testing

docs/book/src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
- [Developer Guide](./developer/guide.md)
5656
- [Repository Layout](./developer/repository-layout.md)
5757
- [Rapid iterative development with Tilt](./developer/tilt.md)
58+
- [Logging](./developer/logging.md)
5859
- [Testing](./developer/testing.md)
5960
- [Developing E2E tests](./developer/e2e.md)
6061
- [Controllers](./developer/architecture/controllers.md)

docs/book/src/developer/logging.md

+176
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# Logging
2+
The Cluster API project is committed to improving the SRE/developer experience when troubleshooting issues, and logging
3+
plays an important part in this goal.
4+
5+
In Cluster API we strive to follow three principles while implementing logging:
6+
7+
- **Logs are for SRE & developers, not for end users!**
8+
Whenever an end user is required to read logs to understand what is happening in the system, most probably there is an
9+
opportunity for improvement of other observability in our API, like e.g. conditions and events.
10+
- **Navigating logs should be easy**:
11+
We should make sure that SREs/Developers can easily drill down logs while investigating issues, e.g. by allowing to
12+
search all the log entries for a specific Machine object, eventually across different controllers/reconciler logs.
13+
- **Cluster API developers MUST use logs!**
14+
As Cluster API contributors you are not only the ones that implement logs, but also the first users of them. Use it!
15+
Provide feedback!
16+
17+
## Upstream Alignment
18+
19+
Kubernetes defines a set of [logging conventions](https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md),
20+
as well as tools and libraries for logging.
21+
22+
## Continuous improvement
23+
24+
The foundational items of Cluster API logging are:
25+
26+
- Support for structured logging in all the Cluster API controllers (see [log format](#log-format)).
27+
- Using contextual logging (see [contextual logging](#contextual-logging)).
28+
- Adding a minimal set of key/value pairs in the logger at the beginning of each reconcile loop, so all the subsequent
29+
log entries will inherit them (see [key value pairs](#keyvalue-pairs)).
30+
31+
Starting from the above foundations, then the long tail of small improvements will consist of following activities:
32+
33+
- Improve consistency of additional key/value pairs added by single log entries (see [key value pairs](#keyvalue-pairs)).
34+
- Improve log messages (see [log messages](#log-messages)).
35+
- Improve consistency of log levels (see [log levels](#log-levels)).
36+
37+
## Log Format
38+
39+
Controllers MUST provide support for [structured logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging)
40+
and for the [JSON output format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#json-output-format);
41+
quoting the Kubernetes documentation, these are the key elements of this approach:
42+
43+
- Separate a log message from its arguments.
44+
- Treat log arguments as key-value pairs.
45+
- Be easily parsable and queryable.
46+
47+
Cluster API uses all the tooling provided by the Kubernetes community to implement structured logging: [Klog](https://github.com/kubernetes/klog), a
48+
[logr](https://github.com/go-logr/logr) wrapper that works with controller runtime, and other utils for exposing flags
49+
in the controller’s main.go.
50+
51+
Ideally, in a future release of Cluster API we will make JSON output format the default format for all the Cluster API
52+
controllers (currently the default is still text format).
53+
54+
## Contextual logging
55+
56+
[Contextual logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging)
57+
is the practice of using a log stored in the context across the entire chain of calls of a reconcile
58+
action. One of the main advantages of this approach is that key value pairs which are added to the logger at the
59+
beginning of the chain are then inherited by all the subsequent log entries created down the chain.
60+
61+
Contextual logging is also embedded in controller runtime; In Cluster API we use contextual logging via controller runtime's
62+
`LoggerFrom(ctx)` and `LoggerInto(ctx, log)` primitives and this ensures that:
63+
64+
- The logger passed to each reconcile call has a unique `reconcileID`, so all the logs being written during a single
65+
reconcile call can be easily identified (note: controller runtime also adds other useful key value pairs by default).
66+
- The logger has a key value pair identifying the objects being reconciled,e.g. a Machine Deployment, so all the logs
67+
impacting this object can be easily identified.
68+
69+
Cluster API developer MUST ensure that:
70+
71+
- The logger has a set of key value pairs identifying the hierarchy of objects the object being reconciled belongs to,
72+
e.g. the Cluster a Machine Deployment belongs to, so it will be possible to drill down logs for related Cluster API
73+
objects while investigating issues.
74+
75+
## Key/Value Pairs
76+
77+
One of the key elements of structured logging is key-value pairs.
78+
79+
Having consistent key value pairs is a requirement for ensuring readability and for providing support for searching and
80+
correlating lines across logs.
81+
82+
A set of good practices for defining key value pairs is defined in the [Kubernetes Guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments), and
83+
one of the above practices is really important for Cluster API developers
84+
85+
- Developers MUST use `klog.KObj` or `klog.KRef` functions when logging key value pairs for Kubernetes objects, thus
86+
ensuring a key value pair representing a Kubernetes object is formatted consistently in all the logs.
87+
88+
Please note that, in order to ensure logs can be easily searched it is important to ensure consistency for the following
89+
key value pairs (in order of importance):
90+
91+
- Key value pairs identifying the object being reconciled, e.g. a Machine Deployment.
92+
- Key value pairs identifying the hierarchy of objects being reconciled, e.g. the Cluster a Machine Deployment belongs
93+
to.
94+
- Key value pairs identifying side effects on other objects, e.g. while reconciling a MachineDeployment, the controller
95+
creates a MachinesSet.
96+
- Other Key value pairs.
97+
98+
## Log Messages
99+
100+
- A Message MUST always start with a capital letter.
101+
- Period at the end of a message MUST be omitted.
102+
- Always prefer logging before the action, so in case of errors there will be an immediate, visual correlation between
103+
the action log and the corresponding error log; While logging before the action, log verbs should use the -ing form.
104+
- Ideally log messages should surface a different level of detail according to the target log level (see [log levels](#log-levels)
105+
for more details).
106+
107+
## Log Levels
108+
109+
Kubernetes provides a set of [recommendations](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#what-method-to-use)
110+
for log levels; as a small integration on the above guidelines we would like to add:
111+
112+
- Logs at the lower levels of verbosity (<=3) are meant to document “what happened” by describing how an object status
113+
is being changed by controller/reconcilers across subsequent reconciliations; as a rule of thumb, it is reasonable
114+
to assume that a person reading those logs has a deep knowledge of how the system works, but it should not be required
115+
for those persons to have knowledge of the codebase.
116+
- Logs at higher levels of verbosity (>=4) are meant to document “how it happened”, providing insight on thorny parts of
117+
the code; a person reading those logs usually has deep knowledge of the codebase.
118+
- Don’t use verbosity higher than 5.
119+
120+
Ideally, in a future release of Cluster API we will switch to use 2 as a default verbosity (currently it is 0) for all the Cluster API
121+
controllers as recommended by the Kubernetes guidelines.
122+
123+
## Trade-offs
124+
125+
When developing logs there are operational trade-offs to take into account, e.g. verbosity vs space allocation, user
126+
readability vs machine readability, maintainability of the logs across the code base.
127+
128+
A reasonable approach for logging is to keep things simple and implement more log verbosity selectively and only on
129+
thorny parts of code. Over time, based on feedback from SRE/developers, more logs can be added to shed light where necessary.
130+
131+
## Developing and testing logs
132+
133+
We are continuously improving our [Tilt](tilt.md) setup allowing Cluster API developers to use logs and improve them as part of
134+
their development process.
135+
136+
Developers can deploy an entire log observability suite by simply setting the
137+
`deploy_observability` value in your [tilt-setting.yaml](tilt.md#create-a-tilt-settings-file).
138+
139+
```yaml
140+
deploy_observability:
141+
- promtail
142+
- loki
143+
- grafana
144+
```
145+
146+
In order to test structured logging it is required to override some controller flags, and similar approach can be used
147+
for increasing log verbosity, e.g.
148+
149+
```yaml
150+
extra_args:
151+
core:
152+
- "--logging-format=json"
153+
- "--v=5"
154+
docker:
155+
- "--v=5"
156+
- "--logging-format=json"
157+
kubeadm-bootstrap:
158+
- "--v=5"
159+
- "--logging-format=json"
160+
kubeadm-control-plane:
161+
- "--v=5"
162+
- "--logging-format=json"
163+
```
164+
165+
Over time, we would like to further improve this with the help of the community; feel free to open issues or propose
166+
ideas!
167+
168+
## What about providers
169+
Cluster API providers are developed by independent teams, and each team is free to define their own processes and
170+
conventions.
171+
172+
However, given that SRE/developers looking at logs are often required to look both at logs from core CAPI and providers,
173+
we encourage providers to adopt and contribute to the guidelines defined in this document.
174+
175+
It is also worth noting that the foundational elements of the approach described in this document are easy to achieve
176+
by leveraging default Kubernetes tooling for logging.

0 commit comments

Comments
 (0)