|
| 1 | +# Logging |
| 2 | +The Cluster API project is committed to improving the SRE/developer experience when troubleshooting issues, and logging |
| 3 | +plays an important part in this goal. |
| 4 | + |
| 5 | +In Cluster API we strive to follow three principles while implementing logging: |
| 6 | + |
| 7 | +- **Logs are for SRE & developers, not for end users!** |
| 8 | + Whenever an end user is required to read logs to understand what is happening in the system, most probably there is an |
| 9 | + opportunity for improvement of other observability in our API, like e.g. conditions and events. |
| 10 | +- **Navigating logs should be easy**: |
| 11 | + We should make sure that SREs/Developers can easily drill down logs while investigating issues, e.g. by allowing to |
| 12 | + search all the log entries for a specific Machine object, eventually across different controllers/reconciler logs. |
| 13 | +- **Cluster API developers MUST use logs!** |
| 14 | + As Cluster API contributors you are not only the ones that implement logs, but also the first users of them. Use it! |
| 15 | + Provide feedback! |
| 16 | + |
| 17 | +## Upstream Alignment |
| 18 | + |
| 19 | +Kubernetes defines a set of [logging conventions](https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md), |
| 20 | +as well as tools and libraries for logging. |
| 21 | + |
| 22 | +## Continuous improvement |
| 23 | + |
| 24 | +The foundational items of Cluster API logging are: |
| 25 | + |
| 26 | +- Support for structured logging in all the Cluster API controllers (see [log format](#log-format)). |
| 27 | +- Using contextual logging (see [contextual logging](#contextual-logging)). |
| 28 | +- Adding a minimal set of key/value pairs in the logger at the beginning of each reconcile loop, so all the subsequent |
| 29 | + log entries will inherit them (see [key value pairs](#keyvalue-pairs)). |
| 30 | + |
| 31 | +Starting from the above foundations, then the long tail of small improvements will consist of following activities: |
| 32 | + |
| 33 | +- Improve consistency of additional key/value pairs added by single log entries (see [key value pairs](#keyvalue-pairs)). |
| 34 | +- Improve log messages (see [log messages](#log-messages)). |
| 35 | +- Improve consistency of log levels (see [log levels](#log-levels)). |
| 36 | + |
| 37 | +## Log Format |
| 38 | + |
| 39 | +Controllers MUST provide support for [structured logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging) |
| 40 | +and for the [JSON output format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#json-output-format); |
| 41 | +quoting the Kubernetes documentation, these are the key elements of this approach: |
| 42 | + |
| 43 | +- Separate a log message from its arguments. |
| 44 | +- Treat log arguments as key-value pairs. |
| 45 | +- Be easily parsable and queryable. |
| 46 | + |
| 47 | +Cluster API uses all the tooling provided by the Kubernetes community to implement structured logging: [Klog](https://github.com/kubernetes/klog), a |
| 48 | +[logr](https://github.com/go-logr/logr) wrapper that works with controller runtime, and other utils for exposing flags |
| 49 | +in the controller’s main.go. |
| 50 | + |
| 51 | +Ideally, in a future release of Cluster API we will make JSON output format the default format for all the Cluster API |
| 52 | +controllers (currently the default is still text format). |
| 53 | + |
| 54 | +## Contextual logging |
| 55 | + |
| 56 | +[Contextual logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging) |
| 57 | +is the practice of using a log stored in the context across the entire chain of calls of a reconcile |
| 58 | +action. One of the main advantages of this approach is that key value pairs which are added to the logger at the |
| 59 | +beginning of the chain are then inherited by all the subsequent log entries created down the chain. |
| 60 | + |
| 61 | +Contextual logging is also embedded in controller runtime; In Cluster API we use contextual logging via controller runtime's |
| 62 | +`LoggerFrom(ctx)` and `LoggerInto(ctx, log)` primitives and this ensures that: |
| 63 | + |
| 64 | +- The logger passed to each reconcile call has a unique `reconcileID`, so all the logs being written during a single |
| 65 | + reconcile call can be easily identified (note: controller runtime also adds other useful key value pairs by default). |
| 66 | +- The logger has a key value pair identifying the objects being reconciled,e.g. a Machine Deployment, so all the logs |
| 67 | + impacting this object can be easily identified. |
| 68 | + |
| 69 | +Cluster API developer MUST ensure that: |
| 70 | + |
| 71 | +- The logger has a set of key value pairs identifying the hierarchy of objects the object being reconciled belongs to, |
| 72 | + e.g. the Cluster a Machine Deployment belongs to, so it will be possible to drill down logs for related Cluster API |
| 73 | + objects while investigating issues. |
| 74 | + |
| 75 | +## Key/Value Pairs |
| 76 | + |
| 77 | +One of the key elements of structured logging is key-value pairs. |
| 78 | + |
| 79 | +Having consistent key value pairs is a requirement for ensuring readability and for providing support for searching and |
| 80 | +correlating lines across logs. |
| 81 | + |
| 82 | +A set of good practices for defining key value pairs is defined in the [Kubernetes Guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments), and |
| 83 | +one of the above practices is really important for Cluster API developers |
| 84 | + |
| 85 | +- Developers MUST use `klog.KObj` or `klog.KRef` functions when logging key value pairs for Kubernetes objects, thus |
| 86 | + ensuring a key value pair representing a Kubernetes object is formatted consistently in all the logs. |
| 87 | + |
| 88 | +Please note that, in order to ensure logs can be easily searched it is important to ensure consistency for the following |
| 89 | +key value pairs (in order of importance): |
| 90 | + |
| 91 | +- Key value pairs identifying the object being reconciled, e.g. a Machine Deployment. |
| 92 | +- Key value pairs identifying the hierarchy of objects being reconciled, e.g. the Cluster a Machine Deployment belongs |
| 93 | + to. |
| 94 | +- Key value pairs identifying side effects on other objects, e.g. while reconciling a MachineDeployment, the controller |
| 95 | + creates a MachinesSet. |
| 96 | +- Other Key value pairs. |
| 97 | + |
| 98 | +## Log Messages |
| 99 | + |
| 100 | +- A Message MUST always start with a capital letter. |
| 101 | +- Period at the end of a message MUST be omitted. |
| 102 | +- Always prefer logging before the action, so in case of errors there will be an immediate, visual correlation between |
| 103 | + the action log and the corresponding error log; While logging before the action, log verbs should use the -ing form. |
| 104 | +- Ideally log messages should surface a different level of detail according to the target log level (see [log levels](#log-levels) |
| 105 | + for more details). |
| 106 | + |
| 107 | +## Log Levels |
| 108 | + |
| 109 | +Kubernetes provides a set of [recommendations](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#what-method-to-use) |
| 110 | +for log levels; as a small integration on the above guidelines we would like to add: |
| 111 | + |
| 112 | +- Logs at the lower levels of verbosity (<=3) are meant to document “what happened” by describing how an object status |
| 113 | + is being changed by controller/reconcilers across subsequent reconciliations; as a rule of thumb, it is reasonable |
| 114 | + to assume that a person reading those logs has a deep knowledge of how the system works, but it should not be required |
| 115 | + for those persons to have knowledge of the codebase. |
| 116 | +- Logs at higher levels of verbosity (>=4) are meant to document “how it happened”, providing insight on thorny parts of |
| 117 | + the code; a person reading those logs usually has deep knowledge of the codebase. |
| 118 | +- Don’t use verbosity higher than 5. |
| 119 | + |
| 120 | +Ideally, in a future release of Cluster API we will switch to use 2 as a default verbosity (currently it is 0) for all the Cluster API |
| 121 | +controllers as recommended by the Kubernetes guidelines. |
| 122 | + |
| 123 | +## Trade-offs |
| 124 | + |
| 125 | +When developing logs there are operational trade-offs to take into account, e.g. verbosity vs space allocation, user |
| 126 | +readability vs machine readability, maintainability of the logs across the code base. |
| 127 | + |
| 128 | +A reasonable approach for logging is to keep things simple and implement more log verbosity selectively and only on |
| 129 | +thorny parts of code. Over time, based on feedback from SRE/developers, more logs can be added to shed light where necessary. |
| 130 | + |
| 131 | +## Developing and testing logs |
| 132 | + |
| 133 | +We are continuously improving our [Tilt](tilt.md) setup allowing Cluster API developers to use logs and improve them as part of |
| 134 | +their development process. |
| 135 | + |
| 136 | +Developers can deploy an entire log observability suite by simply setting the |
| 137 | +`deploy_observability` value in your [tilt-setting.yaml](tilt.md#create-a-tilt-settings-file). |
| 138 | + |
| 139 | +```yaml |
| 140 | +deploy_observability: |
| 141 | +- promtail |
| 142 | +- loki |
| 143 | +- grafana |
| 144 | +``` |
| 145 | +
|
| 146 | +In order to test structured logging it is required to override some controller flags, and similar approach can be used |
| 147 | +for increasing log verbosity, e.g. |
| 148 | +
|
| 149 | +```yaml |
| 150 | +extra_args: |
| 151 | + core: |
| 152 | + - "--logging-format=json" |
| 153 | + - "--v=5" |
| 154 | + docker: |
| 155 | + - "--v=5" |
| 156 | + - "--logging-format=json" |
| 157 | + kubeadm-bootstrap: |
| 158 | + - "--v=5" |
| 159 | + - "--logging-format=json" |
| 160 | + kubeadm-control-plane: |
| 161 | + - "--v=5" |
| 162 | + - "--logging-format=json" |
| 163 | +``` |
| 164 | +
|
| 165 | +Over time, we would like to further improve this with the help of the community; feel free to open issues or propose |
| 166 | +ideas! |
| 167 | +
|
| 168 | +## What about providers |
| 169 | +Cluster API providers are developed by independent teams, and each team is free to define their own processes and |
| 170 | +conventions. |
| 171 | +
|
| 172 | +However, given that SRE/developers looking at logs are often required to look both at logs from core CAPI and providers, |
| 173 | +we encourage providers to adopt and contribute to the guidelines defined in this document. |
| 174 | +
|
| 175 | +It is also worth noting that the foundational elements of the approach described in this document are easy to achieve |
| 176 | +by leveraging default Kubernetes tooling for logging. |
0 commit comments