Skip to content

Allow to override global tracer_provider after tracers creation #1445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Adding `opentelemetry-distro` package to add default configuration for
span exporter to OTLP
([#1482](https://github.com/open-telemetry/opentelemetry-python/pull/1482))
- Allow to set global `tracer_provider` after `tracer`s creation
([#1445](https://github.com/open-telemetry/opentelemetry-python/pull/1445))

### Changed
- `opentelemetry-exporter-zipkin` Updated zipkin exporter status code and error tag
Expand Down
91 changes: 84 additions & 7 deletions opentelemetry-api/src/opentelemetry/trace/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@

import abc
import enum
import functools
import typing
from contextlib import contextmanager
from logging import getLogger
Expand Down Expand Up @@ -410,23 +411,99 @@ def use_span(
yield


class _ProxyTracer(Tracer):
"""Proxies all calls to current TracerProvider
"""

def __init__(
self, get_current_tracer: typing.Callable[[], "Tracer"],
) -> None:
self._get_current_tracer = get_current_tracer

def start_span(
self,
name: str,
context: typing.Optional[Context] = None,
kind: SpanKind = SpanKind.INTERNAL,
attributes: types.Attributes = None,
links: typing.Sequence[Link] = (),
start_time: typing.Optional[int] = None,
record_exception: bool = True,
set_status_on_exception: bool = True,
) -> "Span":
return self._get_current_tracer().start_span(
name=name,
context=context,
kind=kind,
attributes=attributes,
links=links,
start_time=start_time,
record_exception=record_exception,
set_status_on_exception=set_status_on_exception,
)

@contextmanager # type: ignore
def start_as_current_span(
self,
name: str,
context: typing.Optional[Context] = None,
kind: SpanKind = SpanKind.INTERNAL,
attributes: types.Attributes = None,
links: typing.Sequence[Link] = (),
start_time: typing.Optional[int] = None,
record_exception: bool = True,
set_status_on_exception: bool = True,
) -> typing.Iterator["Span"]:
with self._get_current_tracer().start_as_current_span(
name=name,
context=context,
kind=kind,
attributes=attributes,
links=links,
start_time=start_time,
record_exception=record_exception,
set_status_on_exception=set_status_on_exception,
) as span:
yield span

@contextmanager # type: ignore
def use_span(
self, span: "Span", end_on_exit: bool = False,
) -> typing.Iterator[None]:
with self._get_current_tracer().use_span(
span=span, end_on_exit=end_on_exit,
) as context:
yield context


_TRACER_PROVIDER = None


@functools.lru_cache(maxsize=None) # type: ignore
def _get_current_tracer(*args: typing.Any, **kwargs: typing.Any) -> "Tracer":
tracer_provider = get_tracer_provider()
return tracer_provider.get_tracer(*args, **kwargs) # type: ignore
Comment on lines +484 to +485
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user would get a stale DefaultTracer if the cache_clear() below happened while this body was executing. There were no concurrency guarantees around the globals here (because it was intended to be called once in initialization), so might need some other additions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user would get a stale DefaultTracer if the cache_clear() below happened while this body was executing.

And it will definitely will get old tracer if it is called before set_tracer_provider(). So yes, all traces that are produced before complete setup of tracing will be lost. Exactly the same way as now.

But the main idea of that PR — all traces after configuration will be handled. Currently there is no way to do that. So I think it's an improvement.

because it was intended to be called once in initialization

I agree. That should be configured early, during initialization, before spawning threads etc. All I want here is to do that initialization after most imports and maybe after few other calls.

Copy link
Member

@aabmass aabmass Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the ProxyTracer will be stuck proxying to a DefaultTracer because of the cache. I don't think cache_clear() is thread safe. In which case, all spans for that tracer will be lost even after configuration.

  1. thread 1 has a proxy tracer call _get_current_tracer() for the first time (or when the cache is full). L483 executes right before a context switch
  2. thread 2 calls set_tracer_provider() which calls cache_clear() and empties the caches so proxies will use the real provider
  3. _get_current_tracer() from step 1. above finishes and overwrites the cache with a DefaultTracer for that proxy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, valid point.

In python implementation it uses lock for bounded cache, but not for unbounded.

In real world python version overridden with C which does not Py_BEGIN_ALLOW_THREADS and uses GIL for synchronization.

So that may happen if there will be no C-implementation, set_tracer_provider will be used before spawning threads, if switch will happen exactly there… I think it's rather safe to do it this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. We do support PyPy and I'd rather be safe than sorry regarding when threads are spawned. I believe wsgi/asgi runners could start multiple threads early on outside of the programmer's control?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use a cache for this? The fact that we need to look at implementation details of different run times makes not want to use it for this purpose. I feel like we are abusing this by using it in a place where we don't really need a cache.

What if the proxy called get_tracer_provider() on every call until it received a non-default one and then used it from there on?

Something like:

class ProxyTracer:

  def __init__(self, instrumentation_name, instrumentation_version):
    self._real_tracer = None
    self._noop_tracer = NoopTracer()
    self._inst_name = instrumentation_name
    self._inst_version = instrumentation_version

  @property
  def _tracer(self):
    if self._real_tracer:
      return self._real_tracer

    provider = get_tracer_provider()
    if not isinstance(provider, DefaultTracerProvider):
        self._real_tracer = provider.get_tracer(self._inst_name, self._inst_version)
        return self._real_tracer

    return self._noop_tracer

  def start_span(self):
    return self._tracer.start_span()

  # rest of tracer api methods that proxy calls to `self._tracer`

Assuming, default/noop tracer provider's get_tracer() method would always return a ProxyTracer instance. Unless I'm missing something obvious, these two changes should solve this problem no matter whether convenience methods are used or Otel API is directly used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the proxy called get_tracer_provider() on every call until it received a non-default one and then used it from there on?

And what if real won't be provided at all because tracing is disabled for this instance. Traceable instances may work faster than where tracing is disabled?

Something like:

Didn't you just rewritten cache mechanism by yourself? Then why not to use standard one?

My implementation reuses same tracer instance for all proxies, your creates new for each.

I don't think that is critical to have exactly one instance, but why not if it's so easy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you just rewritten cache mechanism by yourself? Then why not to use standard one?

No. It's not a (LRU) cache, just a plain wrapper that is very simple and clear to read/understand. When I see something using an LRU cache, it usually means the code has been added as a performance optimization and we can safely assume that the cache can be invalidated at any time. Our problem is not that creating tracers is an expensive operation. Our problem is that we need to be able create a tracer implementation that can lazily upgrade itself from a noop to a real implementation. Just because a solution for another problem "works" doesn't mean we should use it.

What if the proxy called get_tracer_provider() on every call until it received a non-default one and then used it from there on?

And what if real won't be provided at all because tracing is disabled for this instance. Traceable instances may work faster than where tracing is disabled?

If tracing is disabled somehow then the tracer won't work as expected.



def get_tracer(
instrumenting_module_name: str,
instrumenting_library_version: str = "",
tracer_provider: typing.Optional[TracerProvider] = None,
) -> "Tracer":
"""Returns a `Tracer` for use by the given instrumentation library.

This function is a convenience wrapper for
opentelemetry.trace.TracerProvider.get_tracer.

If tracer_provider is ommited the current configured one is used.
If tracer_provider is omitted it returns a _ProxyTracer
which redirects calls to a current instrumentation library.
"""
if tracer_provider is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. Looks like we are assuming user will always pass a valid/configured tracer_provider. Why not keep existing behavior and check if the provider passed in to the function or fetched by get_tracer_provider() is an initialized one or a default/noop one, and then return a proxy or real tracer accordingly.

Also, doesn't this only solve the issue when users call trace.get_tracer()? What if someone calls get_tracer_provider().get_tracer() instead? Wouldn't we end up with the same problem? Unless I'm missing something, looks like we are only solving the problem partially.

Would it make more sense to patch API package's default tracer provider and have it always return a proxy tracer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. Looks like we are assuming user will always pass a valid/configured tracer_provider. Why not keep existing behavior and check if the provider passed in to the function or fetched by get_tracer_provider() is an initialized one or a default/noop one, and then return a proxy or real tracer accordingly.

get_tracer_provider returns either default one, configured or set by set_tracer_provider. How can you distinguish them?

Also, doesn't this only solve the issue when users call trace.get_tracer()? What if someone calls get_tracer_provider().get_tracer() instead? Wouldn't we end up with the same problem? Unless I'm missing something, looks like we are only solving the problem partially.

There was the same question above: #1445 (comment)

Would it make more sense to patch API package's default tracer provider and have it always return a proxy tracer?

Then you'll need another "real default tracer" that "default proxy tracer" will use when no real provided. Which problem will it solve?

tracer_provider = get_tracer_provider()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton-ryzhov it looks like get_tracer() is no longer a simple shortcut for get_tracer_provider().get_tracer():

  • If user has already set the global provider, they behave the same
  • If user has set OTEL_PYTHON_TRACER_PROVIDER, they behave the same
  • If user has NOT set either of the above, then get_tracer() will give proxies and get_tracer_provider().get_tracer() will give no-ops.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_tracer always returns proxy. It may be no-op or not depending on whether tracer provider is set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think trace.get_tracer() and get_tracer_provider().get_tracer() should behave the same. IMO this implementation should not touch trace.get_tracer() and instead update default tracer provider's get_tracer() method to make this happen.
  2. Ideally, it would be nice if a proxy is only returned when necessary but this is not a huge deal.

I added another comment above proposing a simpler/dumber solution but I'm not sure if I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the comment. What exactly breaks here? Doesn't same thing break if we patch the convenience method instead of the real one?

One might argue this doesn't break anything if the object returned satisfies the tracer interface and in some case actually makes tracing viable. It would actually be a bug fix in that case. No one is relying on this edge-case to disable tracing for the lifetime of their process :)

I strongly think this should be solved at the inner most layer so it works in every possible scenario instead of fixing one wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When get_tracer_provider should return a proxy, when _TRACER_PROVIDER and when a result of _load_trace_provider? How it should distinguish?

If you know how to redo it better — maybe you can replace this PR with your own? It's hard for me to guess and not reasonable to speculate and copypaste your short samples in comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK here is a working POC doing what I was suggesting. It solves the issues linked in this PR. The solution aligns with what was agreed upon in the SIG and implementation is clear enough to understand/maintain. This also does not fix a single convenience method but instead fixes the issue no matter how a user tries to fetch a tracer. Does not always return a proxy so no unnecessary attribute lookups for cases where this is not a problem. Returns the same tracer always but can be updated to return same tracer for unique combination of instrumentation name + library version using a simple dict or a cache like you did.

https://github.com/open-telemetry/opentelemetry-python/pull/1726/files

return _ProxyTracer(
functools.partial(
_get_current_tracer,
instrumenting_module_name=instrumenting_module_name,
instrumenting_library_version=instrumenting_library_version,
Comment on lines +502 to +503
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current implementation does not use the same tracer if you do pass the same instrumenting_module_name and instrumenting_library_version. Idk if this a gap in our implementation of the spec or intended, but the proxy behavior is to return use same one underneath:

from opentelemetry.sdk.trace import TracerProvider

tracer_provider = TracerProvider()
args = ("foo", "version1")
tracer_1 = tracer_provider.get_tracer(*args)
tracer_2 = tracer_provider.get_tracer(*args)

# Current impl, False
tracer_1 is tracer_2

#  --------------
from opentelemetry import trace

tracer_3 = trace.get_tracer(*args)
tracer_4 = trace.get_tracer(*args)

trace.set_tracer_provider(TracerProvider())

# Proxy impl, True
tracer_3._get_current_tracer() is tracer_4._get_current_tracer()

@open-telemetry/python-approvers – thoughts? I don't think changes the behavior of exporter metrics but I'm not super familiar with tracing internals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the spec I think returning the same or different tracers is fine here, I prefer the new behaviour of returning the same tracer

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#get-a-tracer

)
)

return tracer_provider.get_tracer(
instrumenting_module_name, instrumenting_library_version
)
Expand All @@ -445,14 +522,14 @@ def set_tracer_provider(tracer_provider: TracerProvider) -> None:
return

_TRACER_PROVIDER = tracer_provider
_get_current_tracer.cache_clear() # pylint: disable=no-member


def get_tracer_provider() -> TracerProvider:
"""Gets the current global :class:`~.TracerProvider` object."""
global _TRACER_PROVIDER # pylint: disable=global-statement

if _TRACER_PROVIDER is None:
_TRACER_PROVIDER = _load_trace_provider("tracer_provider")
return _load_trace_provider("tracer_provider")

return _TRACER_PROVIDER

Expand Down
189 changes: 180 additions & 9 deletions opentelemetry-api/tests/trace/test_globals.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,93 @@
import unittest
from unittest.mock import patch
from unittest.mock import MagicMock, patch

from opentelemetry import context, trace


class TestGlobals(unittest.TestCase):
def setUp(self):
self._patcher = patch("opentelemetry.trace._TRACER_PROVIDER")
self._mock_tracer_provider = self._patcher.start()

def tearDown(self) -> None:
self._patcher.stop()
trace._TRACER_PROVIDER = None # pylint: disable=protected-access

def test_get_tracer(self):
"""trace.get_tracer should proxy to the global tracer provider."""
trace.get_tracer("foo", "var")
self._mock_tracer_provider.get_tracer.assert_called_with("foo", "var")
"""trace.get_tracer should create a proxy to the global tracer provider."""
tracer_provider = (
trace._TRACER_PROVIDER
) = MagicMock() # pylint: disable=protected-access

tracer = trace.get_tracer("foo", "var")

tracer_provider.get_tracer.assert_not_called()
self.assertIsInstance(
tracer, trace._ProxyTracer # pylint: disable=protected-access
)

tracer.start_span("one")
tracer.start_span("two")
tracer_provider.get_tracer.assert_called_once_with(
instrumenting_module_name="foo",
instrumenting_library_version="var",
)

tracer_provider.reset_mock()
mock_provider = unittest.mock.Mock()
trace.get_tracer("foo", "var", mock_provider)
tracer_provider.get_tracer.assert_not_called()
mock_provider.get_tracer.assert_called_with("foo", "var")

def test_set_tracer_provider(self):
"""trace.get_tracer should update global tracer provider."""
self.assertIsNone(
trace._TRACER_PROVIDER, # pylint: disable=protected-access
)

tracer_provider1 = trace.DefaultTracerProvider()
trace.set_tracer_provider(tracer_provider1)
self.assertIs(
trace._TRACER_PROVIDER, # pylint: disable=protected-access
tracer_provider1,
)

tracer_provider2 = trace.DefaultTracerProvider()
trace.set_tracer_provider(tracer_provider2)
self.assertIs(
trace._TRACER_PROVIDER, # pylint: disable=protected-access
tracer_provider1,
)

@patch("opentelemetry.trace._load_trace_provider")
def test_get_tracer_provider(self, load_trace_provider_mock: "MagicMock"): # type: ignore
"""trace.get_tracer_provider should get or create a global tracer provider."""

# Get before set
load_trace_provider_mock.return_value = trace.DefaultTracerProvider()

tracer_provider = trace.get_tracer_provider()
self.assertIsNone(
trace._TRACER_PROVIDER # pylint: disable=protected-access
)
self.assertIs(
tracer_provider, load_trace_provider_mock.return_value,
)
load_trace_provider_mock.assert_called_once_with("tracer_provider")
load_trace_provider_mock.reset_mock()

# Set
tracer_provider1 = trace.DefaultTracerProvider()
trace.set_tracer_provider(tracer_provider1)
self.assertIs(
trace._TRACER_PROVIDER, # pylint: disable=protected-access
tracer_provider1,
)

# And get
tracer_provider2 = trace.get_tracer_provider()
self.assertIs(
trace._TRACER_PROVIDER, # pylint: disable=protected-access
tracer_provider2,
)
self.assertIs(tracer_provider1, tracer_provider2)
load_trace_provider_mock.assert_not_called()


class TestTracer(unittest.TestCase):
def setUp(self):
Expand All @@ -38,3 +106,106 @@ def test_get_current_span(self):
finally:
context.detach(token)
self.assertEqual(trace.get_current_span(), trace.INVALID_SPAN)


class TestProxyTracer(unittest.TestCase):
def setUp(self):
self.proxy_tracer = trace.get_tracer("foo", "var")
self.inner_tracer = MagicMock(wraps=trace.DefaultTracer())

tracer_provider = MagicMock(wraps=trace.DefaultTracerProvider())
tracer_provider.get_tracer.return_value = self.inner_tracer
trace.set_tracer_provider(tracer_provider)

def tearDown(self) -> None:
trace._TRACER_PROVIDER = None # pylint: disable=protected-access

def test_start_span(self):
"""ProxyTracer should call `start_span` on a real `Tracer`
"""
self.inner_tracer.start_span.assert_not_called()

span = self.proxy_tracer.start_span("span1")
self.assertIs(span, trace.INVALID_SPAN)
self.inner_tracer.start_span.assert_called_once_with(
name="span1",
context=None,
kind=trace.SpanKind.INTERNAL,
attributes=None,
links=(),
start_time=None,
record_exception=True,
set_status_on_exception=True,
)

def test_start_as_current_span(self):
"""ProxyTracer should call `start_as_current_span` on a real `Tracer`
"""
self.inner_tracer.start_as_current_span.assert_not_called()

with self.proxy_tracer.start_as_current_span("span1") as span:
self.assertIs(span, trace.INVALID_SPAN)
self.inner_tracer.start_as_current_span.assert_called_once_with(
name="span1",
context=None,
kind=trace.SpanKind.INTERNAL,
attributes=None,
links=(),
start_time=None,
record_exception=True,
set_status_on_exception=True,
)

self.inner_tracer.start_as_current_span.reset_mock()

@self.proxy_tracer.start_as_current_span("span1")
def func(arg: str, kwarg: str) -> str:
self.assertEqual(arg, "argval")
self.assertEqual(kwarg, "kwargval")
return "retval"

self.inner_tracer.start_as_current_span.assert_not_called()

result = func("argval", kwarg="kwargval")
self.assertEqual(result, "retval")

self.inner_tracer.start_as_current_span.assert_called_once_with(
name="span1",
context=None,
kind=trace.SpanKind.INTERNAL,
attributes=None,
links=(),
start_time=None,
record_exception=True,
set_status_on_exception=True,
)

def test_use_span(self):
"""ProxyTracer should call `use_span` on a real `Tracer`
"""
self.inner_tracer.use_span.assert_not_called()

span = self.proxy_tracer.start_span("span1")

with self.proxy_tracer.use_span(span) as current_context:
self.assertIsNone(current_context)
self.inner_tracer.use_span.assert_called_once_with(
span=span, end_on_exit=False,
)

self.inner_tracer.use_span.reset_mock()

@self.proxy_tracer.use_span(span)
def func(arg: str, kwarg: str) -> str:
self.assertEqual(arg, "argval")
self.assertEqual(kwarg, "kwargval")
return "retval"

self.inner_tracer.use_span.assert_not_called()

result = func("argval", kwarg="kwargval")
self.assertEqual(result, "retval")

self.inner_tracer.use_span.assert_called_once_with(
span=span, end_on_exit=False,
)
24 changes: 8 additions & 16 deletions opentelemetry-sdk/tests/trace/test_globals.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,17 @@
# type:ignore
import unittest
from logging import WARNING

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider # type:ignore


class TestGlobals(unittest.TestCase):
def test_tracer_provider_override_warning(self):
"""trace.set_tracer_provider should throw a warning when overridden"""
trace.set_tracer_provider(TracerProvider())
tracer_provider = trace.get_tracer_provider()
with self.assertLogs(level=WARNING) as test:
trace.set_tracer_provider(TracerProvider())
self.assertEqual(
test.output,
[
(
"WARNING:opentelemetry.trace:Overriding of current "
"TracerProvider is not allowed"
)
],
)
def test_tracer_provider_override(self):
"""trace.set_tracer_provider should override global tracer"""
tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)
self.assertIs(tracer_provider, trace.get_tracer_provider())

new_tracer_provider = TracerProvider()
trace.set_tracer_provider(new_tracer_provider)
self.assertIs(tracer_provider, trace.get_tracer_provider())