-
Notifications
You must be signed in to change notification settings - Fork 698
Allow to override global tracer_provider
after tracer
s creation
#1445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
4c9d348
to
069b7e3
Compare
Any comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! We briefly discussed this approach in the last SIG meeting (cc @owais).
I am worried this is a little too much magic. I think the user should have to explicitly set the providers at initialization and we should encourage doing it that way.
Another thing is get_tracer()
should just be a convenience alias for get_tracer_provider().get_tracer()
. Here, they do different things. I guess you would have to move the proxying to be at the provider level to fix this.
_TRACER_PROVIDER = None | ||
|
||
|
||
@functools.lru_cache(maxsize=128) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can have a maxsize here. If you call provider.get_tracer()
multiple times (even with the SDK one), it will not give you back the same tracer. Here's what the spec says:
It is unspecified whether or under which conditions the same or different Tracer instances are returned from this functions.
So if the cache was full, it would start giving new tracers even though the user is not asking for a new one themself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some linter was complaining about not defined maxsize
. But it could be defined as None
to make it unbounded — that's make sense here, I'll change.
tracer_provider = get_tracer_provider() | ||
return tracer_provider.get_tracer(*args, **kwargs) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user would get a stale DefaultTracer
if the cache_clear()
below happened while this body was executing. There were no concurrency guarantees around the globals here (because it was intended to be called once in initialization), so might need some other additions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user would get a stale DefaultTracer if the cache_clear() below happened while this body was executing.
And it will definitely will get old tracer if it is called before set_tracer_provider()
. So yes, all traces that are produced before complete setup of tracing will be lost. Exactly the same way as now.
But the main idea of that PR — all traces after configuration will be handled. Currently there is no way to do that. So I think it's an improvement.
because it was intended to be called once in initialization
I agree. That should be configured early, during initialization, before spawning threads etc. All I want here is to do that initialization after most imports and maybe after few other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the ProxyTracer
will be stuck proxying to a DefaultTracer
because of the cache. I don't think cache_clear()
is thread safe. In which case, all spans for that tracer will be lost even after configuration.
- thread 1 has a proxy tracer call
_get_current_tracer()
for the first time (or when the cache is full). L483 executes right before a context switch - thread 2 calls
set_tracer_provider()
which callscache_clear()
and empties the caches so proxies will use the real provider _get_current_tracer()
from step 1. above finishes and overwrites the cache with aDefaultTracer
for that proxy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, valid point.
In python implementation it uses lock for bounded cache, but not for unbounded.
In real world python version overridden with C which does not Py_BEGIN_ALLOW_THREADS and uses GIL for synchronization.
So that may happen if there will be no C-implementation, set_tracer_provider
will be used before spawning threads, if switch will happen exactly there… I think it's rather safe to do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. We do support PyPy and I'd rather be safe than sorry regarding when threads are spawned. I believe wsgi/asgi runners could start multiple threads early on outside of the programmer's control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use a cache for this? The fact that we need to look at implementation details of different run times makes not want to use it for this purpose. I feel like we are abusing this by using it in a place where we don't really need a cache.
What if the proxy called get_tracer_provider()
on every call until it received a non-default one and then used it from there on?
Something like:
class ProxyTracer:
def __init__(self, instrumentation_name, instrumentation_version):
self._real_tracer = None
self._noop_tracer = NoopTracer()
self._inst_name = instrumentation_name
self._inst_version = instrumentation_version
@property
def _tracer(self):
if self._real_tracer:
return self._real_tracer
provider = get_tracer_provider()
if not isinstance(provider, DefaultTracerProvider):
self._real_tracer = provider.get_tracer(self._inst_name, self._inst_version)
return self._real_tracer
return self._noop_tracer
def start_span(self):
return self._tracer.start_span()
# rest of tracer api methods that proxy calls to `self._tracer`
Assuming, default/noop tracer provider's get_tracer()
method would always return a ProxyTracer
instance. Unless I'm missing something obvious, these two changes should solve this problem no matter whether convenience methods are used or Otel API is directly used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the proxy called get_tracer_provider() on every call until it received a non-default one and then used it from there on?
And what if real won't be provided at all because tracing is disabled for this instance. Traceable instances may work faster than where tracing is disabled?
Something like:
Didn't you just rewritten cache mechanism by yourself? Then why not to use standard one?
My implementation reuses same tracer instance for all proxies, your creates new for each.
I don't think that is critical to have exactly one instance, but why not if it's so easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you just rewritten cache mechanism by yourself? Then why not to use standard one?
No. It's not a (LRU) cache, just a plain wrapper that is very simple and clear to read/understand. When I see something using an LRU cache, it usually means the code has been added as a performance optimization and we can safely assume that the cache can be invalidated at any time. Our problem is not that creating tracers is an expensive operation. Our problem is that we need to be able create a tracer implementation that can lazily upgrade itself from a noop to a real implementation. Just because a solution for another problem "works" doesn't mean we should use it.
What if the proxy called get_tracer_provider() on every call until it received a non-default one and then used it from there on?
And what if real won't be provided at all because tracing is disabled for this instance. Traceable instances may work faster than where tracing is disabled?
If tracing is disabled somehow then the tracer won't work as expected.
Thanks for your reply.
But how to practically implement that in real world? That also depends on definition of "user". Are them only "end-users" — authors of an application (executable entry point)? Shouldn't the library also be used by authors of other libraries? See my example in the first post — imagine I'm writing only Should I set some provider there before calling Should I only dynamically create tracer on call: def echo(text):
with trace.get_tracer(__name__).start_as_current_span("submodule context manager"):
print(text) ? And as author of an application — should I then configure and set And what if I need to read configuration to setup
I also tried that first, but it didn't work well. I made a Provided implementation is more lightweight, straightforward and backward-compatible. |
a4d0b93
to
7e315b9
Compare
Yes I think that is the intention of OTel. For libraries authors, they should instrument only with the opentelemetry-api and the "users" (i.e. person writing the executable entry point) should configure the providers before the import statements.
This is a good use case, but I don't think this PR would solve it either, right? The spans created by that client would be lost either way. Btw, you can already do something similar to this PR by setting the global provider and then adding additional config to it later if it needs to be fetched from other instrumented modules. That way you only need to set provider before other imports: # before other imports
trace.set_tracer_provider(TracerProvider())
# do any imports in any order you like ...
from myconfig import fetch_otel_config
config = fetch_otel_config()
trace.get_tracer_provider().add_span_processor(
BatchExportSpanProcessor(config.create_span_exporter())
) What do you think? Same as this PR, all spans before adding the processor will be lost.
If opentelemetry-python/opentelemetry-api/src/opentelemetry/trace/__init__.py Lines 422 to 423 in 5450e6d
|
I'm not trying to catch traces happened before the configuration — it's impractical. With this PR I'm allowing to postpone configuration to a later step — after imports but before main business logic execution.
Hm… okay, didn't know that hack. This will make late configuration possible. But I still think it's too hacky and limited. Is there only But if there will be an alternate SDK implementation, that won't allow to decide on it at later step. Anyway, the rule that some custom code must be executed at the very beginning, even before imports, to allow configuration — looks unexpected, strange and unobvious requirement.
I said "more compatible", not "completely". Add of If anyone called
The comment should be changed, yes. |
Thanks for the feedback, I agree it's not obvious and should be better documented. I'm gonna bring this discussion into the issue, because I think we just need to make a decision on the approach. |
807e3cf
to
2b04abb
Compare
2b04abb
to
a71c1d0
Compare
a71c1d0
to
7c83942
Compare
Branch rebased onto current |
3af2595
to
d68cb46
Compare
@aabmass @lonewolf3739 |
My comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review everyone!
@anton-ryzhov I think both the comments I made here (about differing behavior) would be resolved if we proxy all the way to the tracer provider level. Is it very nasty to implement it this way?
""" | ||
if tracer_provider is None: | ||
tracer_provider = get_tracer_provider() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-ryzhov it looks like get_tracer()
is no longer a simple shortcut for get_tracer_provider().get_tracer()
:
- If user has already set the global provider, they behave the same
- If user has set
OTEL_PYTHON_TRACER_PROVIDER
, they behave the same - If user has NOT set either of the above, then
get_tracer()
will give proxies andget_tracer_provider().get_tracer()
will give no-ops.
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_tracer
always returns proxy. It may be no-op or not depending on whether tracer provider is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think
trace.get_tracer()
andget_tracer_provider().get_tracer()
should behave the same. IMO this implementation should not touchtrace.get_tracer()
and instead update default tracer provider'sget_tracer()
method to make this happen. - Ideally, it would be nice if a proxy is only returned when necessary but this is not a huge deal.
I added another comment above proposing a simpler/dumber solution but I'm not sure if I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the comment. What exactly breaks here? Doesn't same thing break if we patch the convenience method instead of the real one?
One might argue this doesn't break anything if the object returned satisfies the tracer interface and in some case actually makes tracing viable. It would actually be a bug fix in that case. No one is relying on this edge-case to disable tracing for the lifetime of their process :)
I strongly think this should be solved at the inner most layer so it works in every possible scenario instead of fixing one wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When get_tracer_provider
should return a proxy, when _TRACER_PROVIDER
and when a result of _load_trace_provider
? How it should distinguish?
If you know how to redo it better — maybe you can replace this PR with your own? It's hard for me to guess and not reasonable to speculate and copypaste your short samples in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK here is a working POC doing what I was suggesting. It solves the issues linked in this PR. The solution aligns with what was agreed upon in the SIG and implementation is clear enough to understand/maintain. This also does not fix a single convenience method but instead fixes the issue no matter how a user tries to fetch a tracer. Does not always return a proxy so no unnecessary attribute lookups for cases where this is not a problem. Returns the same tracer always but can be updated to return same tracer for unique combination of instrumentation name + library version using a simple dict or a cache like you did.
https://github.com/open-telemetry/opentelemetry-python/pull/1726/files
instrumenting_module_name=instrumenting_module_name, | ||
instrumenting_library_version=instrumenting_library_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current implementation does not use the same tracer if you do pass the same instrumenting_module_name
and instrumenting_library_version
. Idk if this a gap in our implementation of the spec or intended, but the proxy behavior is to return use same one underneath:
from opentelemetry.sdk.trace import TracerProvider
tracer_provider = TracerProvider()
args = ("foo", "version1")
tracer_1 = tracer_provider.get_tracer(*args)
tracer_2 = tracer_provider.get_tracer(*args)
# Current impl, False
tracer_1 is tracer_2
# --------------
from opentelemetry import trace
tracer_3 = trace.get_tracer(*args)
tracer_4 = trace.get_tracer(*args)
trace.set_tracer_provider(TracerProvider())
# Proxy impl, True
tracer_3._get_current_tracer() is tracer_4._get_current_tracer()
@open-telemetry/python-approvers – thoughts? I don't think changes the behavior of exporter metrics but I'm not super familiar with tracing internals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the spec I think returning the same or different tracers is fine here, I prefer the new behaviour of returning the same tracer
Co-authored-by: Aaron Abbott <[email protected]>
Co-authored-by: Aaron Abbott <[email protected]>
I don't think it's even possible without braking backward compatibility. There are 2 options then:
|
Any updates on this issue? |
I think we somewhat solved this with the env var solution but I wasn't deeply involved in the discussion so don't remember exactly.
I think providing a proxy that returns noop provider until a real provider is set would be a good solutions. There are a couple of issues that come with it. First one is what to do with the spans that are generated before a real global provider is registered. Should the proxy tracer hold on to them and flush once real provider is available? Should they be dropped? I think dropping with a warning is a good enough tradeoff especially since this can be progressively improved later without breaking changes. There were also other concerns related to consistency with how MeterProviders work that I don't remember exactly. @anton-ryzhov it might be worth bringing this up in the next SIG call and discussing there. |
I thought you've decided here. |
Oh, I guess I forgot about that :) |
Co-authored-by: Diego Hurtado <[email protected]>
I've clearly forgotten about this PR so I'll need to review once more but if we do get 2 approvals, I don't want to block it. I'll still review and open issues if I find anything I think can be improved in that case. |
CHANGELOG.md
Outdated
@@ -46,6 +46,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
- Adding `opentelemetry-distro` package to add default configuration for | |||
span exporter to OTLP | |||
([#1482](https://github.com/open-telemetry/opentelemetry-python/pull/1482)) | |||
- Allow to override global `tracer_provider` after `tracer`s creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true? We don't allow this to happen but instead have a proxy provider until a real one is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add link to PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
If tracer_provider is ommited the current configured one is used. | ||
If tracer_provider is omitted it returns a _ProxyTracer | ||
which redirects calls to a current instrumentation library. | ||
""" | ||
if tracer_provider is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this. Looks like we are assuming user will always pass a valid/configured tracer_provider. Why not keep existing behavior and check if the provider passed in to the function or fetched by get_tracer_provider() is an initialized one or a default/noop one, and then return a proxy or real tracer accordingly.
Also, doesn't this only solve the issue when users call trace.get_tracer()
? What if someone calls get_tracer_provider().get_tracer()
instead? Wouldn't we end up with the same problem? Unless I'm missing something, looks like we are only solving the problem partially.
Would it make more sense to patch API package's default tracer provider and have it always return a proxy tracer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this. Looks like we are assuming user will always pass a valid/configured tracer_provider. Why not keep existing behavior and check if the provider passed in to the function or fetched by get_tracer_provider() is an initialized one or a default/noop one, and then return a proxy or real tracer accordingly.
get_tracer_provider
returns either default one, configured or set by set_tracer_provider
. How can you distinguish them?
Also, doesn't this only solve the issue when users call trace.get_tracer()? What if someone calls get_tracer_provider().get_tracer() instead? Wouldn't we end up with the same problem? Unless I'm missing something, looks like we are only solving the problem partially.
There was the same question above: #1445 (comment)
Would it make more sense to patch API package's default tracer provider and have it always return a proxy tracer?
Then you'll need another "real default tracer" that "default proxy tracer" will use when no real provided. Which problem will it solve?
tracer_provider = get_tracer_provider() | ||
return tracer_provider.get_tracer(*args, **kwargs) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use a cache for this? The fact that we need to look at implementation details of different run times makes not want to use it for this purpose. I feel like we are abusing this by using it in a place where we don't really need a cache.
What if the proxy called get_tracer_provider()
on every call until it received a non-default one and then used it from there on?
Something like:
class ProxyTracer:
def __init__(self, instrumentation_name, instrumentation_version):
self._real_tracer = None
self._noop_tracer = NoopTracer()
self._inst_name = instrumentation_name
self._inst_version = instrumentation_version
@property
def _tracer(self):
if self._real_tracer:
return self._real_tracer
provider = get_tracer_provider()
if not isinstance(provider, DefaultTracerProvider):
self._real_tracer = provider.get_tracer(self._inst_name, self._inst_version)
return self._real_tracer
return self._noop_tracer
def start_span(self):
return self._tracer.start_span()
# rest of tracer api methods that proxy calls to `self._tracer`
Assuming, default/noop tracer provider's get_tracer()
method would always return a ProxyTracer
instance. Unless I'm missing something obvious, these two changes should solve this problem no matter whether convenience methods are used or Otel API is directly used.
""" | ||
if tracer_provider is None: | ||
tracer_provider = get_tracer_provider() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think
trace.get_tracer()
andget_tracer_provider().get_tracer()
should behave the same. IMO this implementation should not touchtrace.get_tracer()
and instead update default tracer provider'sget_tracer()
method to make this happen. - Ideally, it would be nice if a proxy is only returned when necessary but this is not a huge deal.
I added another comment above proposing a simpler/dumber solution but I'm not sure if I'm missing something.
Replaced by #1726 |
Description
Allow to override global
tracer_provider
aftertracer
s creation.With this change any module may do
and wire that tracer to its code (e.g. by applying
start_as_current_span
decorator).And such modules may be safely imported to a main app before initializing/configuring real tracer provider. All tracers created before (on import time) will use that provider.
Fixes # (issue)
#1159
#1276
https://gitter.im/open-telemetry/opentelemetry-python?at=5fc8be67f46e246609860465
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
submodule.py
main.py
Does This PR Require a Contrib Repo Change?
Answer the following question based on these examples of changes that would require a Contrib Repo Change:
The OTel specification has changed which prompted this PR to update the method interfaces of
opentelemetry-api/
oropentelemetry-sdk/
The method interfaces of
opentelemetry-instrumentation/
have changedThe method interfaces of
test/util
have changedScripts in
scripts/
that were copied over to the Contrib repo have changedConfiguration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in
pyproject.toml
isort.cfg
.flake8
When a new
.github/CODEOWNER
is addedMajor changes to project information, such as in:
README.md
CONTRIBUTING.md
Yes. - Link to PR:
No.
Checklist: