Skip to content

[code-browser] observability of extension installations and seach #11608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akosyakov opened this issue Jul 25, 2022 · 4 comments · Fixed by #12539
Closed

[code-browser] observability of extension installations and seach #11608

akosyakov opened this issue Jul 25, 2022 · 4 comments · Fixed by #12539
Assignees
Labels
editor: code (browser) operations: observability This issue relates to the observability of Gitpod (metrics, logs, traces) team: IDE

Comments

@akosyakov
Copy link
Member

OpenVSX proxy provides some isolation to us against OpenVSX incidents. Unfortunately we don't really know to which extent. We need to have analytics on errors and latencies of extension installations and search from the perspective of a user. VS Code already provides such telemetry we need to use prometheus push gateway endpoint of the supervisor to observe it. We could start by counting errors on these operations, i.e. add gitpod_code_extension_action_count metric with the following labels:

  • action: install | search
  • outcome: 'user-failure' | 'gitpod-failure' | 'success'
    • user-failure is invalid args, missing extensions and so on
    • gitpod-failure if OpenVSX or proxy is not responsive or bugs on our side in VS Code
  • error: string - a coarse grained predefined error code, high-cardinality is expensive, so we should be careful here, real errors should be logged if we can group errors in some classes it is better

Later we should move reporting to IDE proxy and push all errors there as well for analytics in GCP error reporting, but it is blocked on #11134 right now

@akosyakov akosyakov added editor: code (browser) operations: observability This issue relates to the observability of Gitpod (metrics, logs, traces) labels Jul 25, 2022
@akosyakov akosyakov moved this to In Progress in 🚀 IDE Team Jul 25, 2022
@jeanp413
Copy link
Member

jeanp413 commented Jul 28, 2022

This look redundant to add in vscode for me 🤔

user-failure is invalid args, missing extensions and so on

We can track this in openvsx proxy if resultCount is 0, but this looks more analytics related than observability related

gitpod-failure if OpenVSX or proxy is not responsive

If openvsx is down we already have these metrics and the alert, why do we need another metric?.

bugs on our side in VS Code

Can you give some examples? I don't see a reason why we would need to modify extension query code in vscode, if there's some upstream change then that should be caught in our insiders and during smoke test, deploying vscode with a breaking change does not makes sense

@akosyakov akosyakov assigned akosyakov and unassigned jeanp413 Aug 1, 2022
@akosyakov
Copy link
Member Author

akosyakov commented Aug 1, 2022

The point here that we don't actually know that OpenVSX proxy is helping. New alert only indicates that there are issues with OpenVSX. We need an alert which says that there are issues on our side, or OpenVSX proxy is managing. I reassigned to me, since I'm on-call this week.

@jeanp413
Copy link
Member

jeanp413 commented Aug 4, 2022

The point here that we don't actually know that OpenVSX proxy is helping.

Isn't that what served responses from backup cache graph tells us when openvsx is down? if it wasn't helping then it wouldn't be returning any response from backup as all queries will be a cache miss or do you mean you don't trust the responses from the cache 🤔?

@akosyakov akosyakov removed their assignment Aug 5, 2022
@akosyakov akosyakov removed this from 🚀 IDE Team Aug 5, 2022
@akosyakov
Copy link
Member Author

Isn't that what served responses from backup cache graph tells us when openvsx is down?

You maybe get a response one request, but another failed, so total installation operation failed. We would like to understand reliability of user operations. During last incident it was showing 15% but I could search and install, it failed very rare. I could not understand for long time why it is so, till we figured out that requests were not from VS Code at all. A graph which shows that users can search and install extensions 99% will clearly communicate impact.

@akosyakov akosyakov self-assigned this Aug 31, 2022
@akosyakov akosyakov moved this to In Progress in 🚀 IDE Team Aug 31, 2022
Repository owner moved this from In Progress to Done in 🚀 IDE Team Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editor: code (browser) operations: observability This issue relates to the observability of Gitpod (metrics, logs, traces) team: IDE
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants