-
Notifications
You must be signed in to change notification settings - Fork 697
Allow to override global tracer_provider
after tracer
s creation
#1445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
15a38d1
6eac14f
849ee6e
1dc4143
7c83942
d68cb46
4e5f321
af70ca5
f2f8f0e
0a756e5
e1eae2c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,6 +76,7 @@ | |
|
||
import abc | ||
import enum | ||
import functools | ||
import typing | ||
from contextlib import contextmanager | ||
from logging import getLogger | ||
|
@@ -410,23 +411,99 @@ def use_span( | |
yield | ||
|
||
|
||
class _ProxyTracer(Tracer): | ||
"""Proxies all calls to current TracerProvider | ||
""" | ||
|
||
def __init__( | ||
self, get_current_tracer: typing.Callable[[], "Tracer"], | ||
) -> None: | ||
self._get_current_tracer = get_current_tracer | ||
|
||
def start_span( | ||
self, | ||
name: str, | ||
context: typing.Optional[Context] = None, | ||
kind: SpanKind = SpanKind.INTERNAL, | ||
attributes: types.Attributes = None, | ||
links: typing.Sequence[Link] = (), | ||
start_time: typing.Optional[int] = None, | ||
record_exception: bool = True, | ||
set_status_on_exception: bool = True, | ||
) -> "Span": | ||
return self._get_current_tracer().start_span( | ||
name=name, | ||
context=context, | ||
kind=kind, | ||
attributes=attributes, | ||
links=links, | ||
start_time=start_time, | ||
record_exception=record_exception, | ||
set_status_on_exception=set_status_on_exception, | ||
) | ||
|
||
@contextmanager # type: ignore | ||
def start_as_current_span( | ||
self, | ||
name: str, | ||
context: typing.Optional[Context] = None, | ||
kind: SpanKind = SpanKind.INTERNAL, | ||
attributes: types.Attributes = None, | ||
links: typing.Sequence[Link] = (), | ||
start_time: typing.Optional[int] = None, | ||
record_exception: bool = True, | ||
set_status_on_exception: bool = True, | ||
) -> typing.Iterator["Span"]: | ||
with self._get_current_tracer().start_as_current_span( | ||
name=name, | ||
context=context, | ||
kind=kind, | ||
attributes=attributes, | ||
links=links, | ||
start_time=start_time, | ||
record_exception=record_exception, | ||
set_status_on_exception=set_status_on_exception, | ||
) as span: | ||
yield span | ||
|
||
@contextmanager # type: ignore | ||
def use_span( | ||
self, span: "Span", end_on_exit: bool = False, | ||
) -> typing.Iterator[None]: | ||
with self._get_current_tracer().use_span( | ||
span=span, end_on_exit=end_on_exit, | ||
) as context: | ||
yield context | ||
|
||
|
||
_TRACER_PROVIDER = None | ||
|
||
|
||
@functools.lru_cache(maxsize=None) # type: ignore | ||
def _get_current_tracer(*args: typing.Any, **kwargs: typing.Any) -> "Tracer": | ||
tracer_provider = get_tracer_provider() | ||
return tracer_provider.get_tracer(*args, **kwargs) # type: ignore | ||
|
||
|
||
def get_tracer( | ||
instrumenting_module_name: str, | ||
instrumenting_library_version: str = "", | ||
tracer_provider: typing.Optional[TracerProvider] = None, | ||
) -> "Tracer": | ||
"""Returns a `Tracer` for use by the given instrumentation library. | ||
|
||
This function is a convenience wrapper for | ||
opentelemetry.trace.TracerProvider.get_tracer. | ||
|
||
If tracer_provider is ommited the current configured one is used. | ||
If tracer_provider is omitted it returns a _ProxyTracer | ||
which redirects calls to a current instrumentation library. | ||
""" | ||
if tracer_provider is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused by this. Looks like we are assuming user will always pass a valid/configured tracer_provider. Why not keep existing behavior and check if the provider passed in to the function or fetched by get_tracer_provider() is an initialized one or a default/noop one, and then return a proxy or real tracer accordingly. Also, doesn't this only solve the issue when users call Would it make more sense to patch API package's default tracer provider and have it always return a proxy tracer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was the same question above: #1445 (comment)
Then you'll need another "real default tracer" that "default proxy tracer" will use when no real provided. Which problem will it solve? |
||
tracer_provider = get_tracer_provider() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @anton-ryzhov it looks like
Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I added another comment above proposing a simpler/dumber solution but I'm not sure if I'm missing something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I read the comment. What exactly breaks here? Doesn't same thing break if we patch the convenience method instead of the real one? One might argue this doesn't break anything if the object returned satisfies the tracer interface and in some case actually makes tracing viable. It would actually be a bug fix in that case. No one is relying on this edge-case to disable tracing for the lifetime of their process :) I strongly think this should be solved at the inner most layer so it works in every possible scenario instead of fixing one wrapper. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When If you know how to redo it better — maybe you can replace this PR with your own? It's hard for me to guess and not reasonable to speculate and copypaste your short samples in comments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK here is a working POC doing what I was suggesting. It solves the issues linked in this PR. The solution aligns with what was agreed upon in the SIG and implementation is clear enough to understand/maintain. This also does not fix a single convenience method but instead fixes the issue no matter how a user tries to fetch a tracer. Does not always return a proxy so no unnecessary attribute lookups for cases where this is not a problem. Returns the same tracer always but can be updated to return same tracer for unique combination of instrumentation name + library version using a simple dict or a cache like you did. https://github.com/open-telemetry/opentelemetry-python/pull/1726/files |
||
return _ProxyTracer( | ||
functools.partial( | ||
_get_current_tracer, | ||
instrumenting_module_name=instrumenting_module_name, | ||
instrumenting_library_version=instrumenting_library_version, | ||
Comment on lines
+502
to
+503
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our current implementation does not use the same tracer if you do pass the same from opentelemetry.sdk.trace import TracerProvider
tracer_provider = TracerProvider()
args = ("foo", "version1")
tracer_1 = tracer_provider.get_tracer(*args)
tracer_2 = tracer_provider.get_tracer(*args)
# Current impl, False
tracer_1 is tracer_2
# --------------
from opentelemetry import trace
tracer_3 = trace.get_tracer(*args)
tracer_4 = trace.get_tracer(*args)
trace.set_tracer_provider(TracerProvider())
# Proxy impl, True
tracer_3._get_current_tracer() is tracer_4._get_current_tracer() @open-telemetry/python-approvers – thoughts? I don't think changes the behavior of exporter metrics but I'm not super familiar with tracing internals There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the spec I think returning the same or different tracers is fine here, I prefer the new behaviour of returning the same tracer |
||
) | ||
) | ||
|
||
return tracer_provider.get_tracer( | ||
instrumenting_module_name, instrumenting_library_version | ||
) | ||
|
@@ -445,14 +522,14 @@ def set_tracer_provider(tracer_provider: TracerProvider) -> None: | |
return | ||
lzchen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
_TRACER_PROVIDER = tracer_provider | ||
_get_current_tracer.cache_clear() # pylint: disable=no-member | ||
|
||
|
||
def get_tracer_provider() -> TracerProvider: | ||
"""Gets the current global :class:`~.TracerProvider` object.""" | ||
global _TRACER_PROVIDER # pylint: disable=global-statement | ||
|
||
if _TRACER_PROVIDER is None: | ||
_TRACER_PROVIDER = _load_trace_provider("tracer_provider") | ||
return _load_trace_provider("tracer_provider") | ||
|
||
return _TRACER_PROVIDER | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,17 @@ | ||
# type:ignore | ||
import unittest | ||
from logging import WARNING | ||
|
||
from opentelemetry import trace | ||
from opentelemetry.sdk.trace import TracerProvider # type:ignore | ||
|
||
|
||
class TestGlobals(unittest.TestCase): | ||
def test_tracer_provider_override_warning(self): | ||
"""trace.set_tracer_provider should throw a warning when overridden""" | ||
trace.set_tracer_provider(TracerProvider()) | ||
tracer_provider = trace.get_tracer_provider() | ||
with self.assertLogs(level=WARNING) as test: | ||
trace.set_tracer_provider(TracerProvider()) | ||
self.assertEqual( | ||
test.output, | ||
[ | ||
( | ||
"WARNING:opentelemetry.trace:Overriding of current " | ||
"TracerProvider is not allowed" | ||
) | ||
], | ||
) | ||
def test_tracer_provider_override(self): | ||
"""trace.set_tracer_provider should override global tracer""" | ||
tracer_provider = TracerProvider() | ||
trace.set_tracer_provider(tracer_provider) | ||
self.assertIs(tracer_provider, trace.get_tracer_provider()) | ||
|
||
new_tracer_provider = TracerProvider() | ||
trace.set_tracer_provider(new_tracer_provider) | ||
self.assertIs(tracer_provider, trace.get_tracer_provider()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user would get a stale
DefaultTracer
if thecache_clear()
below happened while this body was executing. There were no concurrency guarantees around the globals here (because it was intended to be called once in initialization), so might need some other additions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it will definitely will get old tracer if it is called before
set_tracer_provider()
. So yes, all traces that are produced before complete setup of tracing will be lost. Exactly the same way as now.But the main idea of that PR — all traces after configuration will be handled. Currently there is no way to do that. So I think it's an improvement.
I agree. That should be configured early, during initialization, before spawning threads etc. All I want here is to do that initialization after most imports and maybe after few other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the
ProxyTracer
will be stuck proxying to aDefaultTracer
because of the cache. I don't thinkcache_clear()
is thread safe. In which case, all spans for that tracer will be lost even after configuration._get_current_tracer()
for the first time (or when the cache is full). L483 executes right before a context switchset_tracer_provider()
which callscache_clear()
and empties the caches so proxies will use the real provider_get_current_tracer()
from step 1. above finishes and overwrites the cache with aDefaultTracer
for that proxyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, valid point.
In python implementation it uses lock for bounded cache, but not for unbounded.
In real world python version overridden with C which does not Py_BEGIN_ALLOW_THREADS and uses GIL for synchronization.
So that may happen if there will be no C-implementation,
set_tracer_provider
will be used before spawning threads, if switch will happen exactly there… I think it's rather safe to do it this way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. We do support PyPy and I'd rather be safe than sorry regarding when threads are spawned. I believe wsgi/asgi runners could start multiple threads early on outside of the programmer's control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use a cache for this? The fact that we need to look at implementation details of different run times makes not want to use it for this purpose. I feel like we are abusing this by using it in a place where we don't really need a cache.
What if the proxy called
get_tracer_provider()
on every call until it received a non-default one and then used it from there on?Something like:
Assuming, default/noop tracer provider's
get_tracer()
method would always return aProxyTracer
instance. Unless I'm missing something obvious, these two changes should solve this problem no matter whether convenience methods are used or Otel API is directly used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if real won't be provided at all because tracing is disabled for this instance. Traceable instances may work faster than where tracing is disabled?
Didn't you just rewritten cache mechanism by yourself? Then why not to use standard one?
My implementation reuses same tracer instance for all proxies, your creates new for each.
I don't think that is critical to have exactly one instance, but why not if it's so easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's not a (LRU) cache, just a plain wrapper that is very simple and clear to read/understand. When I see something using an LRU cache, it usually means the code has been added as a performance optimization and we can safely assume that the cache can be invalidated at any time. Our problem is not that creating tracers is an expensive operation. Our problem is that we need to be able create a tracer implementation that can lazily upgrade itself from a noop to a real implementation. Just because a solution for another problem "works" doesn't mean we should use it.
If tracing is disabled somehow then the tracer won't work as expected.