Skip to content

Redis Sanitization #2175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nitsanh opened this issue Feb 13, 2024 · 2 comments
Open

Redis Sanitization #2175

nitsanh opened this issue Feb 13, 2024 · 2 comments

Comments

@nitsanh
Copy link

nitsanh commented Feb 13, 2024

Hello,

I'm using opentelemetry-instrumentation-redis in my system to collect traces from Redis. I would like the traces to include info about the key, as I use Redis in different ways throughout the system and I'd like to be able to differentiate between them when viewing my traces. I saw the work you made on sanitizing the db.statement by default here and here.

Is there a way that I can bypass this sanitization or provide my own sanitization function (so I would sanitize the value but not the key)? I didn't find any reference for that in the code or the docs.

Thanks!

@mastizada
Copy link

mastizada commented Jul 31, 2024

I also observed that sanitization is replacing the statement here:

out = [str(args[0])] + ["?"] * (args_length - 1)

So, something like GET key value becomes GET ? ?.

Is there a standard for adding a parameter that will allow us to keep first 2 args and sanitize the rest? Maybe we can provide sanitizer function like _format_command_args as argument?

I am happy to work on it

@mastizada
Copy link

I solved for now for my use case using request hook:

from opentelemetry.sdk.trace import Span
from opentelemetry.semconv.trace import SpanAttributes
from redis.connection import Connection


def sanitize_redis_statement(redis_args: tuple) -> str:
    """
    Based on opentelemetry.instrumentation.redis.utils._format_command_args.
    """
    cmd_max_len = 1000
    value_too_long_mark = "..."

    if not len(redis_args):
        return ""

    args = list(redis_args)
    three_key_list = [
        "HSET",
        "HSETNX",
        "JSON.MSET",
        "JSON.SET",
        "LSET",
        "PSETEX",
        "SETBIT",
        "SETRANGE",
    ]
    two_key_list = ["GETSET", "MSET", "MSETNX", "LPUSH", "LPUSHX", "RPUSH", "RPUSHX"]
    # change values with ? mark
    match args[0]:
        case "SET":
            if len(args) > 2:
                args[2] = "?"
        case value if value in two_key_list:
            if len(args) > 2:
                args[2:] = ["?"] * (len(args) - 2)
        case value if value in three_key_list:
            if len(args) > 3:
                args[3:] = ["?"] * (len(args) - 3)
    # join arguments together to form the query
    query = " ".join(str(element) for element in args)
    # truncate if it is longer than allowed length
    if len(query) > cmd_max_len:
        return query[: cmd_max_len - 3] + value_too_long_mark
    return query


def request_hook(span: Span, instance: Connection, args: tuple, kwargs: dict):
    """
    Custom name for redis and better query sanitizer.

    @param span: Active opentelemetry span
    @param instance: Redis connection instance
    @param args: Arguments for the execute command
    @param kwargs: Keyword arguments for the execute command
    """
    if span and span.is_recording():
        new_name = f"redis.{span.name}"
        span.update_name(new_name)

        span.set_attribute(SpanAttributes.DB_STATEMENT, sanitize_redis_statement(args))

And then when instrumenting redis:

RedisInstrumentor().instrument(request_hook=redis_request_hook)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants