Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of instrumentation for the Google GenAI SDK (github.com/googleapis/python-genai) #3256

Merged
merged 73 commits into from
Feb 27, 2025

Conversation

michaelsafyan
Copy link
Contributor

Description

Adds an opentelemetry-instrumentation-google-genai package to instrument github.com/googleapis/python-genai.

This initial version targets just Client.models.generate_content. There is partial implementation of instrumentation for the streaming and async variants of this function, but enabling it is deferred to a separate PR (we should hammer out the basics in this PR first before writing all the tests for the other cases and fully debugging those).

Does not include other methods (e.g. embedding generation, for example) in the scope. Also does not include in the scope vendor-specific properties that would be useful for this component at this time. (For example, it would be useful to record safety settings and safety ratings, but this is left out of scope for now). This is done to avoid entangling with resolving Semantic Conventions considerations as well as to keep the current work scope narrow.

We will no doubt want to build on this PR in subsequent PRs to round out the instrumentation package.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Created unit tests.

Does This PR Require a Core Repo Change?

I'm not sure what this means.

Checklist:

I could use some help with the checklist items, as I am new to this repo and the associated tooling.

@michaelsafyan
Copy link
Contributor Author

Filed corresponding feature request to be able to connect to that FR and also to lay out the broader plan beyond this initial PR:

#3260

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a list of remaining todos in this PR. I'm OK to go ahead with this and loop back on stuff in there. I will take a stab at moving this to VCR.py in a future PR as well.

@aabmass
Copy link
Member

aabmass commented Feb 25, 2025

@michaelsafyan I started working on moving the tests to use the same VCR testing as other genai packages (WIP): michaelsafyan/open-telemetry.opentelemetry-python-contrib@genaisdk_instrumentation...aabmass:opentelemetry-python-contrib:genaisdk-vcrtests

As I was paying closer attention to the tests, I noticed a few issues with semantic conventions that should be fixed in this PR before merging (also called out in the linked test with # NOTE: comments).

  • When content capturing is not enabled, events should still be emitted. Only the "Opt-In" fields from the semconv should be omitted.
  • When logging the response
    • the event should be gen_ai.choice not gen_ai.assistant.message
    • the message content field should be the content from a single candidate instead of the whole response payload.

@michaelsafyan
Copy link
Contributor Author

  • When content capturing is not enabled, events should still be emitted. Only the "Opt-In" fields from the semconv should be omitted.

Won't this create log spam? Isn't the entire purpose of gen_ai.*.message to record the content? If you have the event without the content, is that actually a reasonable thing to have? What would a user or downstream processing system use such an event for?

@michaelsafyan
Copy link
Contributor Author

Received some clarity on the current data model from @aabmass offline, out-of-band. Will update the PR to better align with current SemConv.

@michaelsafyan
Copy link
Contributor Author

Updated the event logging per the feedback above.

@michaelsafyan
Copy link
Contributor Author

We should resolve and submit this first, but I have sent out a PR for the next step after this:

@github-actions github-actions bot requested review from lmolkova and nirga February 25, 2025 22:00
@aabmass aabmass merged commit c09a299 into open-telemetry:main Feb 27, 2025
705 checks passed
aabmass added a commit that referenced this pull request Mar 3, 2025
…trumentation (#3298)

* Begin instrumentation of GenAI SDK.

* Snapshot current state.

* Created minimal tests and got first test to pass.

* Added test for span attributes.

* Ensure that token counts work.

* Add more tests.

* Make it easy to turn off instrumentation for streaming and async to allow for rapid iteration.

* Add licenses and fill out main README.rst.

* Add a changelog file.

* Fill out 'requirements.txt' and 'README.rst' for the manual instrumentation example.

* Add missing exporter dependency for the manual instrumentation example.

* Fill out rest of the zero-code example.

* Add minimal tests for async, streaming cases.

* Update sync test to use indirection on top of 'client.models.generate_content' to simplify test reuse.

* Fix ruff check issues.

* Add subproject to top-level project build mechanism.

* Simplify invocation of pylint.

* Fix 'make test' command and lint issues.

* Add '.dev' suffix to version per feedback on pull request #3256

* Fix README.rst files for the examples.

* Add specific versions for the examples.

* Revamp 'make test' to not require local 'tox.ini' configuration.

* Extend separators per review comment.

Co-authored-by: Riccardo Magliocchetti <[email protected]>

* Fix version conflict caused by non-hermetic requirements.

* Fix typo on the comment line.

* Add test for the use of the 'vertex_ai' system, and improve how this system is determined.

* Factor out testing logic to enable sharing with the async code.

* Addressed minor lint issues.

* Make it clearer that nonstreaming_base is a helper module that is not invoked directly.

* Integrate feedback from related pull request #3268.

* Update workflows with 'tox -e generate-workflows'.

* Improve data model and add some rudimentary type checking.

* Accept only 'true' for a true value to align with other code.

* Update the scope name used.

* Add **kwargs to patched methods to prevent future breakage due to the addition of future keyword arguments.

* Remove redundant list conversion in call to "sorted".

Co-authored-by: Aaron Abbott <[email protected]>

* Reformat with 'tox -e ruff'.

* Fix failing lint workflow.

* Fix failing lint workflow.

* Exclude Google GenAI instrumentation from the bootstrap code for now.

* Minor improvements to the tooling shell files.

* Fix typo flagged by codespell spellchecker.

* Increase alignment with broader repo practices.

* Add more TODOs and documentation to clarify the intended work scope.

* Remove unneeded accessor from OTelWrapper.

* Add more comments to the tests.

* Reformat with ruff.

* Change 'desireable' to 'desirable' per codespell spellchecker.

* Make tests pass without pythonpath

* Fix new lint errors showing up after change

* Revert "Fix new lint errors showing up after change"

This reverts commit 567adc6.

pylint ignore instead

* Add TODO item required/requested from code review.

Co-authored-by: Aaron Abbott <[email protected]>

* Simplify changelog per PR feedback.

* Remove square brackets from model name in span name per PR feedback.

* Checkpoint current state.

* Misc test cleanup. Now that scripts are invoked solely through pytest via tox, remove main functions and hash bang lines.

* Improve quality of event logging.

* Implement streaming support in RequestsMocker, get tests passing again.

* Add test with multiple responses.

* Remove support for async and streaming from TODOs, since this is now addressed.

* Increase testing coverage for streaming.

* Reformat with ruff.

* Add minor version bump with changelog.

* Change TODOs to bulleted list.

* Update per PR feedback

Co-authored-by: Aaron Abbott <[email protected]>

* Restructure streaming async logic to begin execution earlier.

* Reformat with ruff.

* Disable pylint check for catching broad exception. Should be allowed given exception is re-raised.

* Simplify async streaming solution per PR comment.

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
Co-authored-by: Aaron Abbott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants