Skip to content

Use contextual logging #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 15, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 49 additions & 31 deletions docs/dev.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
<!-- Dev guide -->


## Logging

We use `logr.Logger` interface for logging everywhere.
The logger instance is loaded from `context.Context` or passed around as an argument directly.
This is aligned with contextual logging as explained in [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md).

In other words, we explicitly don't use `klog` global logging calls.
Using `klog` log value helpers like `klog.KObj` is just fine.

### Change log verbosity
We use the `k8s.io/klog/v2` package to manage logging.

We generally follow the [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md), which states "the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4)".

To configure logging verbosity, specify the `v` flag such as `--v=2`.
To configure logging verbosity, specify the `v` flag such as `--v=2`.

### Add logs

The [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) has the following definitions:

* `klog.V(0).InfoS` = `klog.InfoS` - Generally useful for this to **always** be visible to a cluster operator
* `klog.V(1).InfoS` - A reasonable default log level if you don't want verbosity.
* `klog.V(2).InfoS` - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
* `klog.V(3).InfoS` - Extended information about changes
* `klog.V(4).InfoS` - Debug level verbosity
* `klog.V(5).InfoS` - Trace level verbosity
- `logger.V(0).Info` = `logger.Info` - Generally useful for this to **always** be visible to a cluster operator
- `logger.V(1).Info` - A reasonable default log level if you don't want verbosity.
- `logger.V(2).Info` - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
- `logger.V(3).Info` - Extended information about changes
- `logger.V(4).Info` - Debug level verbosity
- `logger.V(5).Info` - Trace level verbosity

We choose to simplify to the following 3 common levels.

```
const(
DEFAULT=2
Expand All @@ -33,34 +39,46 @@ const(

The guidelines are written in the context of a k8s controller. Our [ext-proc](../pkg/ext-proc/) does more things such as handling requests and scraping metrics, therefore we adapt the guidelines as follows:

1. The server startup process and configuration.
* `klog.InfoS` Logging at the `V(0)` verbosity level is generally welcome here as this is only logged once at startup, and provides useful info for debugging.
1. The server startup process and configuration.

- `logger.Info` Logging at the `V(0)` verbosity level is generally welcome here as this is only logged once at startup, and provides useful info for debugging.

2. Reconciler loops. The reconciler loops watch for CR changes such as the `InferenceModel` CR. And given changes in these CRs significantly affect the behavior of the extension, we recommend using v=1 verbosity level as default, and sparsely use higher verbosity levels.
* `klog.V(DEFAULT).InfoS`
* Default log level in the reconcilers.
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions that can be corrected (e.g., inference model not initialized yet)
* System state changing (adding/removing objects in the data store)
* `V(VERBOSE)` and above: Use your best judgement.

- `logger.V(DEFAULT)`
- Default log level in the reconcilers.
- Information about config (listening on X, watching Y)
- Errors that repeat frequently that relate to conditions that can be corrected (e.g., inference model not initialized yet)
- System state changing (adding/removing objects in the data store)
- `logger.V(VERBOSE)` and above: Use your best judgement.

3. Inference request handling. These requests are expected to be much higher volume than the control flow in the reconcilers and therefore we should be mindful of log spamming. We recommend using v=2 to log important info about a request, such as the HTTP response code, and higher verbosity levels for less important info.

* `klog.V(DEFAULT).InfoS`
* Logging the status code of an HTTP request
* Important decision making such as picking the target model, target pod
* `klog.V(VERBOSE).InfoS`
* Detailed request scheduling algorithm operations, such as running the filtering logic
* `V(DEBUG)` and above: Use your best judgement.
- `logger.V(DEFAULT)`
- Logging the status code of an HTTP request
- Important decision making such as picking the target model, target pod
- `logger.V(VERBOSE)`
- Detailed request scheduling algorithm operations, such as running the filtering logic
- `logger.V(DEBUG)` and above: Use your best judgement.

4. Metric scraping loops. These loops run at a very high frequency, and logs can be very spammy if not handled properly.
* `klog.V(TRACE).InfoS`
* Transient errors/warnings, such as failure to get response from a pod.
* Important state changes, such as updating a metric.

5. Misc
- `logger.V(TRACE)`
- Transient errors/warnings, such as failure to get response from a pod.
- Important state changes, such as updating a metric.

5. Misc
1. Periodic (every 5s) debug loop which prints the current pods and metrics.
* `klog.WarningS` If the metrics are not fresh enough, which indicates an error occurred during the metric scraping loop.
* `klog.V(DEBUG).InfoS`
* This is very important to debug the request scheduling algorithm, and yet not spammy compared to the metric scraping loop logs.
- `logger.V(DEFAULT).Error` If the metrics are not fresh enough, which indicates an error occurred during the metric scraping loop.
- `logger.V(DEBUG)`
- This is very important to debug the request scheduling algorithm, and yet not spammy compared to the metric scraping loop logs.

### Passing Logger Around

You can pass around a `context.Context` that contains a logger or a `logr.Logger` instance directly.
You need to make the call which one to use. Passing a `context.Context` is more standard,
on the other hand you then need to call `log.FromContext` everywhere.

As `logger.V` calls are cummulative, i.e. `logger.V(2).V(3)` results in `logger.V(5)`,
a logger should be passed around with no verbosity level set so that `logger.V(DEFAULT)`
actually uses `DEFAULT` verbosity level.
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ require (
github.com/bojand/ghz v0.120.0
github.com/elastic/crd-ref-docs v0.1.0
github.com/envoyproxy/go-control-plane/envoy v1.32.4
github.com/go-logr/logr v1.4.2
github.com/google/go-cmp v0.6.0
github.com/jhump/protoreflect v1.17.0
github.com/onsi/ginkgo/v2 v2.22.2
Expand Down Expand Up @@ -61,7 +62,6 @@ require (
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-logr/zapr v1.3.0 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
Expand Down
10 changes: 5 additions & 5 deletions pkg/ext-proc/backend/datastore.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ import (
"strconv"
"sync"

"github.com/go-logr/logr"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/klog/v2"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)
Expand Down Expand Up @@ -87,7 +88,7 @@ func (ds *K8sDatastore) HasSynced() bool {
return ds.inferencePool != nil
}

func RandomWeightedDraw(model *v1alpha1.InferenceModel, seed int64) string {
func RandomWeightedDraw(logger logr.Logger, model *v1alpha1.InferenceModel, seed int64) string {
var weights int32

source := rand.NewSource(rand.Int63())
Expand All @@ -98,7 +99,7 @@ func RandomWeightedDraw(model *v1alpha1.InferenceModel, seed int64) string {
for _, model := range model.Spec.TargetModels {
weights += *model.Weight
}
klog.V(logutil.VERBOSE).InfoS("Weights for model computed", "model", model.Name, "weights", weights)
logger.V(logutil.TRACE).Info("Weights for model computed", "model", model.Name, "weights", weights)
randomVal := r.Int31n(weights)
for _, model := range model.Spec.TargetModels {
if randomVal < *model.Weight {
Expand Down Expand Up @@ -128,7 +129,7 @@ func (ds *K8sDatastore) flushPodsAndRefetch(ctx context.Context, ctrlClient clie
LabelSelector: selectorFromInferencePoolSelector(newServerPool.Spec.Selector),
Namespace: newServerPool.Namespace,
}); err != nil {
klog.Error(err, "error listing clients")
log.FromContext(ctx).V(logutil.DEFAULT).Error(err, "Failed to list clients")
}
ds.pods.Clear()

Expand All @@ -139,7 +140,6 @@ func (ds *K8sDatastore) flushPodsAndRefetch(ctx context.Context, ctrlClient clie
}
ds.pods.Store(pod, true)
}

}

func selectorFromInferencePoolSelector(selector map[v1alpha1.LabelKey]v1alpha1.LabelValue) labels.Selector {
Expand Down
4 changes: 3 additions & 1 deletion pkg/ext-proc/backend/datastore_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (

v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)

func TestHasSynced(t *testing.T) {
Expand Down Expand Up @@ -46,6 +47,7 @@ func TestHasSynced(t *testing.T) {
}

func TestRandomWeightedDraw(t *testing.T) {
logger := logutil.NewTestLogger()
tests := []struct {
name string
model *v1alpha1.InferenceModel
Expand Down Expand Up @@ -118,7 +120,7 @@ func TestRandomWeightedDraw(t *testing.T) {
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
for range 10000 {
model := RandomWeightedDraw(test.model, seedVal)
model := RandomWeightedDraw(logger, test.model, seedVal)
if model != test.want {
t.Errorf("Model returned!: %v", model)
break
Expand Down
4 changes: 2 additions & 2 deletions pkg/ext-proc/backend/fake.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package backend
import (
"context"

klog "k8s.io/klog/v2"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)
Expand All @@ -17,7 +17,7 @@ func (f *FakePodMetricsClient) FetchMetrics(ctx context.Context, pod Pod, existi
if err, ok := f.Err[pod]; ok {
return nil, err
}
klog.V(logutil.VERBOSE).InfoS("Fetching metrics for pod", "pod", pod, "existing", existing, "new", f.Res[pod])
log.FromContext(ctx).V(logutil.VERBOSE).Info("Fetching metrics for pod", "pod", pod, "existing", existing, "new", f.Res[pod])
return f.Res[pod], nil
}

Expand Down
26 changes: 14 additions & 12 deletions pkg/ext-proc/backend/inferencemodel_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ package backend
import (
"context"

"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/tools/record"
"k8s.io/klog/v2"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)
Expand All @@ -27,38 +28,39 @@ func (c *InferenceModelReconciler) Reconcile(ctx context.Context, req ctrl.Reque
return ctrl.Result{}, nil
}

klogV := klog.V(logutil.DEFAULT)
klogV.InfoS("Reconciling InferenceModel", "name", req.NamespacedName)
logger := log.FromContext(ctx)
loggerDefault := logger.V(logutil.DEFAULT)
loggerDefault.Info("Reconciling InferenceModel", "name", req.NamespacedName)

infModel := &v1alpha1.InferenceModel{}
if err := c.Get(ctx, req.NamespacedName, infModel); err != nil {
if errors.IsNotFound(err) {
klogV.InfoS("InferenceModel not found. Removing from datastore since object must be deleted", "name", req.NamespacedName)
loggerDefault.Info("InferenceModel not found. Removing from datastore since object must be deleted", "name", req.NamespacedName)
c.Datastore.InferenceModels.Delete(infModel.Spec.ModelName)
return ctrl.Result{}, nil
}
klogV.ErrorS(err, "Unable to get InferenceModel", "name", req.NamespacedName)
loggerDefault.Error(err, "Unable to get InferenceModel", "name", req.NamespacedName)
return ctrl.Result{}, err
} else if !infModel.DeletionTimestamp.IsZero() {
klogV.InfoS("InferenceModel is marked for deletion. Removing from datastore", "name", req.NamespacedName)
loggerDefault.Info("InferenceModel is marked for deletion. Removing from datastore", "name", req.NamespacedName)
c.Datastore.InferenceModels.Delete(infModel.Spec.ModelName)
return ctrl.Result{}, nil
}

c.updateDatastore(infModel)
c.updateDatastore(logger, infModel)
return ctrl.Result{}, nil
}

func (c *InferenceModelReconciler) updateDatastore(infModel *v1alpha1.InferenceModel) {
klogV := klog.V(logutil.DEFAULT)
func (c *InferenceModelReconciler) updateDatastore(logger logr.Logger, infModel *v1alpha1.InferenceModel) {
loggerDefault := logger.V(logutil.DEFAULT)

if infModel.Spec.PoolRef.Name == c.PoolNamespacedName.Name {
klogV.InfoS("Updating datastore", "poolRef", infModel.Spec.PoolRef, "serverPoolName", c.PoolNamespacedName)
klogV.InfoS("Adding/Updating InferenceModel", "modelName", infModel.Spec.ModelName)
loggerDefault.Info("Updating datastore", "poolRef", infModel.Spec.PoolRef, "serverPoolName", c.PoolNamespacedName)
loggerDefault.Info("Adding/Updating InferenceModel", "modelName", infModel.Spec.ModelName)
c.Datastore.InferenceModels.Store(infModel.Spec.ModelName, infModel)
return
}
klogV.InfoS("Removing/Not adding InferenceModel", "modelName", infModel.Spec.ModelName)
loggerDefault.Info("Removing/Not adding InferenceModel", "modelName", infModel.Spec.ModelName)
// If we get here. The model is not relevant to this pool, remove.
c.Datastore.InferenceModels.Delete(infModel.Spec.ModelName)
}
Expand Down
5 changes: 4 additions & 1 deletion pkg/ext-proc/backend/inferencemodel_reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)

var (
Expand Down Expand Up @@ -46,6 +47,8 @@ var (
)

func TestUpdateDatastore_InferenceModelReconciler(t *testing.T) {
logger := logutil.NewTestLogger()

tests := []struct {
name string
datastore *K8sDatastore
Expand Down Expand Up @@ -135,7 +138,7 @@ func TestUpdateDatastore_InferenceModelReconciler(t *testing.T) {
Datastore: test.datastore,
PoolNamespacedName: types.NamespacedName{Name: test.datastore.inferencePool.Name},
}
reconciler.updateDatastore(test.incomingService)
reconciler.updateDatastore(logger, test.incomingService)

if ok := mapsEqual(reconciler.Datastore.InferenceModels, test.wantInferenceModels); !ok {
t.Error("Maps are not equal")
Expand Down
18 changes: 11 additions & 7 deletions pkg/ext-proc/backend/inferencepool_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@ import (
"context"
"reflect"

"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/tools/record"
klog "k8s.io/klog/v2"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)
Expand All @@ -29,29 +31,31 @@ func (c *InferencePoolReconciler) Reconcile(ctx context.Context, req ctrl.Reques
if req.NamespacedName.Name != c.PoolNamespacedName.Name || req.NamespacedName.Namespace != c.PoolNamespacedName.Namespace {
return ctrl.Result{}, nil
}
klogV := klog.V(logutil.DEFAULT)
klogV.InfoS("Reconciling InferencePool", "name", req.NamespacedName)

logger := log.FromContext(ctx)
loggerDefault := logger.V(logutil.DEFAULT)
loggerDefault.Info("Reconciling InferencePool", "name", req.NamespacedName)

serverPool := &v1alpha1.InferencePool{}
if err := c.Get(ctx, req.NamespacedName, serverPool); err != nil {
klogV.ErrorS(err, "Unable to get InferencePool", "name", req.NamespacedName)
loggerDefault.Error(err, "Unable to get InferencePool", "name", req.NamespacedName)
return ctrl.Result{}, err
}
if c.Datastore.inferencePool == nil || !reflect.DeepEqual(serverPool.Spec.Selector, c.Datastore.inferencePool.Spec.Selector) {
c.updateDatastore(serverPool)
c.updateDatastore(logger, serverPool)
c.Datastore.flushPodsAndRefetch(ctx, c.Client, serverPool)
} else {
c.updateDatastore(serverPool)
c.updateDatastore(logger, serverPool)
}

return ctrl.Result{}, nil
}

func (c *InferencePoolReconciler) updateDatastore(serverPool *v1alpha1.InferencePool) {
func (c *InferencePoolReconciler) updateDatastore(logger logr.Logger, serverPool *v1alpha1.InferencePool) {
pool, _ := c.Datastore.getInferencePool()
if pool == nil ||
serverPool.ObjectMeta.ResourceVersion != pool.ObjectMeta.ResourceVersion {
klog.V(logutil.DEFAULT).InfoS("Updating inference pool", "target", klog.KMetadata(&serverPool.ObjectMeta))
logger.V(logutil.DEFAULT).Info("Updating inference pool", "target", klog.KMetadata(&serverPool.ObjectMeta))
c.Datastore.setInferencePool(serverPool)
}
}
Expand Down
5 changes: 4 additions & 1 deletion pkg/ext-proc/backend/inferencepool_reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/ext-proc/util/logging"
)

var (
Expand Down Expand Up @@ -41,6 +42,8 @@ var (
)

func TestUpdateDatastore_InferencePoolReconciler(t *testing.T) {
logger := logutil.NewTestLogger()

tests := []struct {
name string
datastore *K8sDatastore
Expand Down Expand Up @@ -74,7 +77,7 @@ func TestUpdateDatastore_InferencePoolReconciler(t *testing.T) {
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
inferencePoolReconciler := &InferencePoolReconciler{Datastore: test.datastore}
inferencePoolReconciler.updateDatastore(test.incomingPool)
inferencePoolReconciler.updateDatastore(logger, test.incomingPool)

gotPool := inferencePoolReconciler.Datastore.inferencePool
if !reflect.DeepEqual(gotPool, test.wantPool) {
Expand Down
Loading