Skip to content

✨ Add exemplars to prometheus metrics #7644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bisgaard-itis
Copy link
Contributor

@bisgaard-itis bisgaard-itis commented May 7, 2025

What do these changes do?

  • Add Prometheus exemplars to our Prometheus metrics. This will allow us (in a couple of mouse clicks) to navigate from the Grafana dashboard associated with a given simcore service to the trace of a relevant request. I.e. if we see a that response latency is very slow, we can navigate to the trace of one of the requests for which the server is so slow to respond. Since our logs are also instrumented we can use the trace to navigate to all the logs emitted during handling that request. One can then to the same for another similar request (same endpoint) at a point in time when response latency was not so slow and compare the traces/logs. This workflow is inspired by https://github.com/blueswen/fastapi-observability
  • Since https://github.com/trallnag/prometheus-fastapi-instrumentator doesn't support adding exemplars and we have already hit the limitations of that library I decided to consolidate the aiohttp and fastapi middleware we use to collect prometheus metrics. Hence, I will also update our Grafana dashboards accordingly.

Related issue/s

How to test

Dev-ops

@bisgaard-itis bisgaard-itis added this to the Bazinga! milestone May 7, 2025
@bisgaard-itis bisgaard-itis self-assigned this May 7, 2025
Copy link

codecov bot commented May 7, 2025

Codecov Report

Attention: Patch coverage is 89.38547% with 19 lines in your changes missing coverage. Please review.

Project coverage is 87.22%. Comparing base (97fafc4) to head (ef82189).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7644      +/-   ##
==========================================
- Coverage   87.51%   87.22%   -0.30%     
==========================================
  Files        1811     1427     -384     
  Lines       70308    59370   -10938     
  Branches     1143      604     -539     
==========================================
- Hits        61533    51787    -9746     
+ Misses       8462     7398    -1064     
+ Partials      313      185     -128     
Flag Coverage Δ
integrationtests 64.41% <100.00%> (+<0.01%) ⬆️
unittests 86.29% <89.38%> (-0.44%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library 72.36% <87.31%> (+0.72%) ⬆️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.07% <ø> (ø)
agent 96.46% <100.00%> (ø)
api_server 91.62% <100.00%> (ø)
autoscaling 96.07% <100.00%> (-0.01%) ⬇️
catalog 92.64% <83.33%> (ø)
clusters_keeper 99.25% <100.00%> (ø)
dask_sidecar 89.86% <ø> (ø)
datcore_adapter 98.12% <100.00%> (ø)
director 76.78% <100.00%> (-0.03%) ⬇️
director_v2 91.12% <100.00%> (-0.05%) ⬇️
dynamic_scheduler 96.76% <100.00%> (ø)
dynamic_sidecar 90.18% <ø> (ø)
efs_guardian 89.79% <ø> (ø)
invitations 93.28% <100.00%> (ø)
payments 92.63% <100.00%> (ø)
resource_usage_tracker 89.13% <ø> (ø)
storage 87.49% <80.00%> (-0.32%) ⬇️
webclient ∅ <ø> (∅)
webserver 85.82% <ø> (+0.02%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97fafc4...ef82189. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bisgaard-itis bisgaard-itis marked this pull request as ready for review May 7, 2025 14:11
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks please consider my comments

@bisgaard-itis bisgaard-itis requested a review from mrnicegyu11 May 7, 2025 15:10
@bisgaard-itis bisgaard-itis requested a review from GitHK May 7, 2025 15:23
Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in thinking that you removed the fastapi-instrumentation library usage?
If yes should you not also remove its dependency?

Copy link
Contributor

@YuryHrytsuk YuryHrytsuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to consolidate the aiohttp and fastapi middleware we use to collect prometheus metrics.

What does it mean? This is not clear

@bisgaard-itis
Copy link
Contributor Author

I decided to consolidate the aiohttp and fastapi middleware we use to collect prometheus metrics.

What does it mean? This is not clear

Before we used our own custom implementation middleware for collecting prometheus metrics for aiohttp services and a pip installed tool for fastapi services. Now I have removed that tool and refactored the aiohttp middleware to have an aiohttp middleware and a separate fastapi middleware (custom implementation) which call the same module for collecting prometheus metrics

@bisgaard-itis
Copy link
Contributor Author

Also I do not see any test

tests have been added.

@bisgaard-itis bisgaard-itis requested a review from sanderegg May 9, 2025 11:33
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I would like to ask you some specific details about it sometime offline

@bisgaard-itis bisgaard-itis enabled auto-merge (squash) May 19, 2025 07:15
@bisgaard-itis bisgaard-itis merged commit eee1de2 into ITISFoundation:master May 19, 2025
94 of 95 checks passed
@bisgaard-itis bisgaard-itis deleted the 7635-add-exemplars-to-prometheus-metrics branch May 19, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument prometheus metrics via exemplars
6 participants