-
Notifications
You must be signed in to change notification settings - Fork 703
Refactor BatchLogRecordProcessor and associated tests #4535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The OTEL_BLRP_EXPORT_TIMEOUT should work with all exporters, not just OTLP. I think the intention of having a separate one for OTLP is to specifically target OTLP exporters in case there are multiple BLRP instances. It's definitely a bit clunky though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a huge improvement to the complexity of the threading code 😃
I'd like to get some more eyes on this since it concurrency bugs can be really subtle
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Outdated
Show resolved
Hide resolved
I think the failing Windows run is pretty typical of what we see with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for these important changes!
My only feedback here is maybe it would also be helpful to try to eventually factor some classes out. For example, self._queue
and self._queue_lock
are often used together and perhaps could be in their own class. Also, more generally, we're doing batching for spans and logs -- could we use one generic batcher that could handle both signals?
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/__init__.py
Show resolved
Hide resolved
Added a buffer to that test that flaked, thanks for point that out. Hopefully it passes this time |
Sounds good ! I will look into this. I was planning to fix the BatchSpanProcessor code which works the exact same way, so some generic batch class makes a lot of sense. I think I'll do that in a separate PR tho, this one already getting big |
Can someone add the Skip Changelog tag ? I don't think this needs a changelog, since it's basically just a refactor and not changing behavior |
Alright I think this is good to merge, just need the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner than before thanks!
Description
Refactor BatchLogRecordProcessor, keeping the existing behavior mostly the same. This PR cleans up the code, including the tests, and also adds some new tests.
One exception is
forceFlush
which now callsexport
synchronously from the main thread and waits for it to finish.Previously
forceFlush
would waittimeout_millis
for the worker thread to make and finish anexport
call, and if an export call was in progress it would wait for the subsequentexport
call to finish. It would returntrue
if this export call completed in time andfalse
otherwise. It didn't cancel the request after timeout, it just stopped waiting for it to finish.I think ideally
forceFlush.timeout_millis
(and alsoshutdown.timeout_millis
) should be used as the time after which theexport
call(s) gets cancelled. But for that to work we need to be able to pass a timeout toexport
like what was proposed in #4183. Until then I think we should ignore it and document that it doesn't work.I'm not sure what
forceFlush
should return, currently I have it return nothing (same as javascript. It could always return True, to signify thatexport
was called until the queue was empty. It could return True if allexport
calls succeeded, and False otherwise, and it could stop flushing after the first failed export, like how go lang does it.I think my proposed behavior is more inline with the spec too.
Note that the default for
forceFlush.timeout_millis
came from theOTEL_BLRP_EXPORT_TIMEOUT
environment variable which is supposed to configure "the maximum allowed time to export data from the BatchLogRecordProcessor". I propose we leave this env var unused for now, and document that it doesn't do anything. This flag seems redundant with the OTLP Exporter timeout env vars anyway. Maybe in other languages the BatchLogRecordProcessor isn't the default one used for auto instrumentation, so it makes more sense for it to be configurable ?Type of change
Please delete options that are not relevant.Please delete options that are not relevant.
How Has This Been Tested?
Added lots of unit tests.
Does This PR Require a Contrib Repo Change?
Checklist: