Skip to content

reduce log level of "prefix cached servers" to TRACE #842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 15, 2025

Conversation

nirrozenbaum
Copy link
Contributor

default log level of epp is 4, having this log in debug level bombs the log with this line.

when running scheduler with prefix plugin enabled the log includes multiple lines of found cached servers.
this is running for every prefix and every server that includes that prefix.
the below log is a small part of a SINGLE REQUEST log:

{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"scheduling/scheduler.go:139","msg":"Running pre-schedule plugin","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","plugin":"prefix-cache"}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":91}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":90}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":89}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":88}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":87}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":86}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":85}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":84}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":83}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":82}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":81}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":80}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":79}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":78}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":77}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":76}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":75}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":74}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":73}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":72}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":71}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":70}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":69}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":68}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":67}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":66}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":65}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":64}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":63}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":62}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":61}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":60}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":59}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":58}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":57}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":56}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":55}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":54}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":53}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":52}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":51}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":50}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":49}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":48}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":47}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":46}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":45}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":44}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-id":"97307923-8d96-4eaf-840d-3cb1c735f465","request":"TargetModel: meta-llama/Llama-3.1-70B-Instruct, Critical: true, PromptLength: 6803, Headers: map[:authority:inference-gateway :method:POST :path:/v1/chat/completions :scheme:http accept:application/json accept-encoding:gzip, deflate authorization:Bearer EMPTY content-length:7028 content-type:application/json user-agent:AsyncOpenAI/Python 1.77.0 x-envoy-external-address:10.129.2.64 x-forwarded-for:10.129.2.64 x-forwarded-proto:http x-request-id:97307923-8d96-4eaf-840d-3cb1c735f465 x-stainless-arch:x64 x-stainless-async:async:asyncio x-stainless-lang:python x-stainless-os:Linux x-stainless-package-version:1.77.0 x-stainless-read-timeout:600 x-stainless-retry-count:0 x-stainless-runtime:CPython x-stainless-runtime-version:3.12.10]","cachedServersError":"json: unsupported type: map[prefix.ServerID]bool","total # blocks":107,"longest prefix":43}
{"level":"Level(-4)","ts":"2025-05-15T18:30:36Z","caller":"prefix/plugin.go:187","msg":"Found cached servers","x-request-........

...
...
...

default log level of epp is 4, having this log in debug level bombs the log with this line.

Signed-off-by: Nir Rozenbaum <[email protected]>
Copy link

netlify bot commented May 15, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit b839d7f
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68263d18d77a8a00087a3c62
😎 Deploy Preview https://deploy-preview-842--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 15, 2025
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and robscott May 15, 2025 18:48
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 15, 2025
@nirrozenbaum nirrozenbaum changed the title reduced log level of prefix cached servers to trace. reduce log level of "prefix cached servers" to TRACE May 15, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 15, 2025

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2025
@nirrozenbaum
Copy link
Contributor Author

opened issue #843 for integration test flakes

Signed-off-by: Nir Rozenbaum <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 15, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2025
@k8s-ci-robot k8s-ci-robot merged commit 77f8564 into kubernetes-sigs:main May 15, 2025
8 checks passed
@nirrozenbaum nirrozenbaum deleted the prefix branch May 15, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants