Skip to content

OCPBUGS-12903: Add new web console usage metrics #1910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

christoph-jerolimov
Copy link
Member

@christoph-jerolimov christoph-jerolimov commented Mar 2, 2023

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Fixes: OCPBUGS-12903

This PR adds new console metrics that are collected in ODC-7232 and should make them later available in Superset DataHat or Tableau.

See also PR openshift/console#12527 for detailed information about the collected metrics.

Update: PR openshift/console#12684 reduced the metrics cardinality based on the feedback in this PR here.

A quick overview:

  1. GaugeVec cluster_version_capability with label name is already part of telemetry, and could be used in internal Superset or Tableau, or ❓
  2. Sum Counter console_auth_login_requests_total
  3. Sum CounterVec console_auth_login_successes_total by label role
  4. Sum CounterVec console_auth_login_failures_total by label reason
  5. Sum CounterVec console_auth_logout_requests_total by label reason
  6. Sum CounterVec console_usage_total by labels event and perspective
    ⚠️ Update: Removed this metric completely from this PR.
  7. Max GaugeVec console_usage_users by label role
    Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 only used roles was reported.
  8. Max GaugeVec console_plugins_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the plugins are grouped by their name to reduce cardinality.
    Name could be just "redhat", "demo" and "other" at the moment
  9. Max GaugeVec console_customization_perspectives_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the perspectives are grouped as well.
    Name could be just "admin", "dev", "acm" and "other"

An example of what the console /metrics endpoint provides:

# 1
# HELP console_auth_login_requests_total Total number of login requests from the frontend.
# TYPE console_auth_login_requests_total counter
console_auth_login_requests_total 27

# 2
# HELP console_auth_login_successes_total Total number of successful logins. Role label is based on RBAC can list namespaces check.
# TYPE console_auth_login_successes_total counter
console_auth_login_successes_total{role="cluster-admin"} 3
console_auth_login_successes_total{role="developer"} 23
console_auth_login_successes_total{role="kubeadmin"} 1

# 3
# HELP console_auth_login_failures_total Total number of login failures.
# TYPE console_auth_login_failures_total counter
console_auth_login_failures_total{reason="unknown"} 0

# 4
# HELP console_auth_logout_requests_total Total number of logout requests from the frontend.
# TYPE console_auth_logout_requests_total counter
console_auth_logout_requests_total{reason="unknown"} 0

# 5 --- NOT PART OF THIS PR ANYMORE --- keep that here to give the code review comments a context
#
# HELP console_usage_total Total number of events like "page_views" (loading index.html without history.push) and "page_impressions".
# TYPE console_usage_total counter
# console_usage_total{event="page_impression",perspective="admin"} 13
# console_usage_total{event="page_impression",perspective="dev"} 10
# console_usage_total{event="page_view",perspective="admin"} 212
# console_usage_total{event="page_view",perspective="dev"} 432

# 6
# HELP console_usage_users The number of console users splitten into roles (cluster-admin, developer, and unknown if a RBAC check fails)
# TYPE console_usage_users gauge
console_usage_users{role="cluster-admin"} 3
console_usage_users{role="developer"} 32
console_usage_users{role="kubeadmin"} 1

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
#
# OLD:
# console_plugins_info{name="acm",state="enabled"} 1
# console_plugins_info{name="crane-ui-plugin",state="enabled"} 1
# console_plugins_info{name="logging-view-plugin",state="enabled"} 1
# console_plugins_info{name="mce",state="enabled"} 1
# console_plugins_info{name="my-plugin",state="enabled"} 1
#
# NEW:
console_plugins_info{name="redhat",state="notfound"} 4
console_plugins_info{name="demo",state="notfound"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
#
# OLD:
# console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev1",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev2",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev3",state="only-for-developers"} 1
#
# NEW:
console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
console_customization_perspectives_info{name="other",state="only-for-developers"} 3

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 2, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 2, 2023

@jerolimov: This pull request references ODC-7258 which is a valid jira issue.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

This PR (ODC-7258) add new console metrics that are collected in ODC-7232.

See PR openshift/console#12527 for detailed information about the collected metrics.

A quick overview:

  1. GaugeVec (??) cluster_version_capability with label name
  2. Counter console_auth_login_requests_total
  3. CounterVec console_auth_login_successes_total with label role
  4. CounterVec console_auth_login_failures_total with label reason
  5. CounterVec console_auth_logout_requests_total with label reason
  6. CounterVec console_usage_total with labels event and perspective
  7. GaugeVec console_usage_users with label role
  8. GaugeVec console_plugins_info with labels name and state
  9. GaugeVec console_customization_perspectives_info with labels name and state

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 2, 2023

@jerolimov: This pull request references ODC-7258 which is a valid jira issue.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

This PR (ODC-7258) add new console metrics that are collected in ODC-7232.

See PR openshift/console#12527 for detailed information about the collected metrics.

A quick overview:

  1. GaugeVec cluster_version_capability with label name was already part of telemetry, right?
  2. Counter console_auth_login_requests_total
  3. CounterVec console_auth_login_successes_total with label role
  4. CounterVec console_auth_login_failures_total with label reason
  5. CounterVec console_auth_logout_requests_total with label reason
  6. CounterVec console_usage_total with labels event and perspective
  7. GaugeVec console_usage_users with label role
  8. GaugeVec console_plugins_info with labels name and state
  9. GaugeVec console_customization_perspectives_info with labels name and state

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@christoph-jerolimov christoph-jerolimov force-pushed the add-console-metrics branch 2 times, most recently from 3df5569 to 436884f Compare March 2, 2023 22:57
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 2, 2023

@jerolimov: This pull request references ODC-7258 which is a valid jira issue.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

This PR (ODC-7258) add new console metrics that are collected in ODC-7232.

See PR openshift/console#12527 for detailed information about the collected metrics.

A quick overview:

  1. GaugeVec cluster_version_capability with label name is already part of telemetry, and could be used in internal Superset or Tableau, or ❓
  2. Sum Counter console_auth_login_requests_total
  3. Sum CounterVec console_auth_login_successes_total by label role
  4. Sum CounterVec console_auth_login_failures_total by label reason
  5. Sum CounterVec console_auth_logout_requests_total by label reason
  6. Sum CounterVec console_usage_total by labels event and perspective
  7. Max ❓ GaugeVec console_usage_users by label role
  8. Sum ❓ GaugeVec console_plugins_info by labels name and state
  9. Sum ❓ GaugeVec console_customization_perspectives_info by labels name and state

An example of what the console /metrics endpoint provides:

# 1
# HELP console_auth_login_requests_total Total number of login requests from the frontend.
# TYPE console_auth_login_requests_total counter
console_auth_login_requests_total 27

# 2
# HELP console_auth_login_successes_total Total number of successful logins. Role label is based on RBAC can list namespaces check.
# TYPE console_auth_login_successes_total counter
console_auth_login_successes_total{role="cluster-admin"} 3
console_auth_login_successes_total{role="developer"} 23
console_auth_login_successes_total{role="kubeadmin"} 1

# 3
# HELP console_auth_login_failures_total Total number of login failures.
# TYPE console_auth_login_failures_total counter
console_auth_login_failures_total{reason="unknown"} 0

# 4
# HELP console_auth_logout_requests_total Total number of logout requests from the frontend.
# TYPE console_auth_logout_requests_total counter
console_auth_logout_requests_total{reason="unknown"} 0

# 5
# HELP console_usage_total Total number of events like "page_views" (loading index.html without history.push) and "page_impressions".
# TYPE console_usage_total counter
console_usage_total{event="page_impression",perspective="admin"} 13
console_usage_total{event="page_impression",perspective="dev"} 10
console_usage_total{event="page_view",perspective="admin"} 212
console_usage_total{event="page_view",perspective="dev"} 432

# 6
# HELP console_usage_users The number of console users splitten into roles (cluster-admin, developer, and unknown if a RBAC check fails)
# TYPE console_usage_users gauge
console_usage_users{role="cluster-admin"} 3
console_usage_users{role="developer"} 32
console_usage_users{role="kubeadmin"} 1

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
console_plugins_info{name="demo-plugin",name="enabled"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
console_customization_perspectives_info{name="dev",name="disabled"} 1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@christoph-jerolimov christoph-jerolimov changed the title [WIP] ODC-7258: Add new web console usage metrics ODC-7258: Add new web console usage metrics Mar 2, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2023
Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jerolimov thanks for the comprehensive context and the drive-by typo fixes 😄

Can you please add some more information on possible label value sets? We are usually trying to only ingest bounded value sets.

For example perspective: I suppose this will be either admin or dev? state will likely one of enabled, disabled and maybe failed or so?
role, reason, event, name (plugin name) seem like they have the potential to have unbounded value sets. Is that the case? If so we need a strong argument, why that is needed or we work on the recording rules to make those value sets finite.

@christoph-jerolimov
Copy link
Member Author

christoph-jerolimov commented Mar 14, 2023

Hey @jan--f, thanks for the review. 😄

Can you please add some more information on possible label value sets? We are usually trying to only ingest bounded value sets.

Sure, FYI, I tried to explain it in this google docs as well.

For example perspective: I suppose this will be either admin or dev? state will likely one of enabled, disabled and maybe failed or so?

For console_customization_perspectives_info

names are in theory unbound, that's correct, but we don't expect much more than admin, dev, acm (or multicluster or something similar), and maybe one or two more in the future. For now, I wouldn't expect more than 1-3 here (the reality can always surprise you...) and for PM it's important to know which perspectives are configured.

Whenever this escalates we are fine to reduce this to admin, dev, other or so. Just to understand this: Would you notice this and would you inform us to change this if needed?

In the state, we report if a customer has "customized" perspective, which means the customer disabled a perspective them completely, or disabled/enabled it only for a specific user group. The user groups are already reduced to these 5 cases:

  • enabled
  • disabled
  • only-for-cluster-admins
  • only-for-developers
  • custom-permissions (rules are RbAC rules, and we use this as fallback for everything we don't know)

role, reason, event, name (plugin name) seem like they have the potential to have unbounded value sets. Is that the case? If so we need a strong argument, why that is needed or we work on the recording rules to make those value sets finite.

roles:

  • in the login case we use exactly these 3:
    • kubeadmin
    • cluster-admin
    • developer
  • in the users metric we use also unknown if an internal error occurs and we can't assign one of the 3 roles.

reasons

The reason for failures is currently always unknown.

Internally we have some more errors which we might want to use. But we (I) decided to just use unknown to have such a metric bound here.

If needed we could add reasons like login canceled or logout timeout or manually` in the future. But we have no plan for this at the moment.

events we use currently exactly three events that tracks the usage of the console:

  • page_view
  • page_impression
  • perspective_changed

There aren't other events planned at the moment.

console_plugins_info:

This is maybe more unbound than perspectives, as the console supports "dynamic plugins" and other teams are invited to extend the console.

Currently, I would expect values like demo, acm, gitops, pipelines, and maybe a few more. But this could evolve over time without our control. I believe different PMs are really interested to see the adoption rate of the plugin infrastructure and the plugins themself.

If this escalates I would expect we need to filter them or track just "redhat" vs "marketplace" vs "customer" plugins. But I don't think we have that distinction yet. (cc @spadgett)

@christoph-jerolimov
Copy link
Member Author

Currently, I would expect values like demo, acm, gitops, pipelines, and maybe a few more. But this could evolve over time without our control. I believe different PMs are really interested to see the adoption rate of the plugin infrastructure and the plugins themself.

If this escalates I would expect we need to filter them or track just "redhat" vs "marketplace" vs "customer" plugins. But I don't think we have that distinction yet.

cc @spadgett

@christoph-jerolimov christoph-jerolimov requested review from jan--f and removed request for simonpasquier and raptorsun March 15, 2023 10:03
@christoph-jerolimov
Copy link
Member Author

/cc @simonpasquier

@openshift-ci openshift-ci bot requested a review from simonpasquier March 15, 2023 10:04
@jan--f
Copy link
Contributor

jan--f commented Mar 16, 2023

Ok so the anticipated label value sets have me a bit worried or rather the open-endedness. We don't currently check for cardinality increases in telemeter. Even if we did, at that stage it would be too late :)

So for this PR, lets change the recording rule in a way so that we really only collect the label values that we need to have in telemetry.
For example:
For rule cluster:console_customization_perspectives_info:sum you currently expect possible values to be one of admin, dev,acm. So lets reflect that in the recording rule like so:

 expr: 'sum(console_customization_perspectives_info{name=~"admin|dev|acm") by (name, state)',
 record: 'cluster:console_customization_perspectives_info:sum',

This allows us to protect against future label value proliferation, to reason about the number of timeseries this PR adds and it forces you to actually reason about what data you need.

Also fair warning: This looks like it'll add quite a few timeseries, so we'll like have to get @eparis's thumbs up. No big deal, just another admin step. 😀

@jan--f
Copy link
Contributor

jan--f commented Mar 16, 2023

Oh and in the PR description you have a small error:

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
console_plugins_info{name="demo-plugin",name="enabled"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
console_customization_perspectives_info{name="dev",name="disabled"} 1

That second label is probably called state?

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 16, 2023

@jerolimov: This pull request references ODC-7258 which is a valid jira issue.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

This PR (ODC-7258) add new console metrics that are collected in ODC-7232.

See PR openshift/console#12527 for detailed information about the collected metrics.

A quick overview:

  1. GaugeVec cluster_version_capability with label name is already part of telemetry, and could be used in internal Superset or Tableau, or ❓
  2. Sum Counter console_auth_login_requests_total
  3. Sum CounterVec console_auth_login_successes_total by label role
  4. Sum CounterVec console_auth_login_failures_total by label reason
  5. Sum CounterVec console_auth_logout_requests_total by label reason
  6. Sum CounterVec console_usage_total by labels event and perspective
  7. Max ❓ GaugeVec console_usage_users by label role
  8. Sum ❓ GaugeVec console_plugins_info by labels name and state
  9. Sum ❓ GaugeVec console_customization_perspectives_info by labels name and state

An example of what the console /metrics endpoint provides:

# 1
# HELP console_auth_login_requests_total Total number of login requests from the frontend.
# TYPE console_auth_login_requests_total counter
console_auth_login_requests_total 27

# 2
# HELP console_auth_login_successes_total Total number of successful logins. Role label is based on RBAC can list namespaces check.
# TYPE console_auth_login_successes_total counter
console_auth_login_successes_total{role="cluster-admin"} 3
console_auth_login_successes_total{role="developer"} 23
console_auth_login_successes_total{role="kubeadmin"} 1

# 3
# HELP console_auth_login_failures_total Total number of login failures.
# TYPE console_auth_login_failures_total counter
console_auth_login_failures_total{reason="unknown"} 0

# 4
# HELP console_auth_logout_requests_total Total number of logout requests from the frontend.
# TYPE console_auth_logout_requests_total counter
console_auth_logout_requests_total{reason="unknown"} 0

# 5
# HELP console_usage_total Total number of events like "page_views" (loading index.html without history.push) and "page_impressions".
# TYPE console_usage_total counter
console_usage_total{event="page_impression",perspective="admin"} 13
console_usage_total{event="page_impression",perspective="dev"} 10
console_usage_total{event="page_view",perspective="admin"} 212
console_usage_total{event="page_view",perspective="dev"} 432

# 6
# HELP console_usage_users The number of console users splitten into roles (cluster-admin, developer, and unknown if a RBAC check fails)
# TYPE console_usage_users gauge
console_usage_users{role="cluster-admin"} 3
console_usage_users{role="developer"} 32
console_usage_users{role="kubeadmin"} 1

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
console_plugins_info{name="demo-plugin",state="enabled"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
console_customization_perspectives_info{name="dev",state="disabled"} 1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@christoph-jerolimov
Copy link
Member Author

Oh and in the PR description you have a small error:
..
That second label is probably called state?

Oh yeah, copy and paste error. Fixed that. 😏

@christoph-jerolimov
Copy link
Member Author

Ok so the anticipated label value sets have me a bit worried or rather the open-endedness. We don't currently check for cardinality increases in telemeter. Even if we did, at that stage it would be too late :)

So for this PR, let's change the recording rule in a way so that we really only collect the label values that we need to have in telemetry. For example: For rule cluster:console_customization_perspectives_info:sum you currently expect possible values to be one of admin, dev,acm. So lets reflect that in the recording rule like so:

 expr: 'sum(console_customization_perspectives_info{name=~"admin|dev|acm") by (name, state)',
 record: 'cluster:console_customization_perspectives_info:sum',

Thanks for that recommendation @jan--f. 👍

Just FYI: I checked our anonymized analytics where we have an event when a user switches perspective. We have currently only users switching between "admin" and "dev".

So this shows that other perspectives aren't really used today (on the clusters we monitoring with frontend analytics). So I'm not worried about too many time series, but I understand that you are. 😄

Is there maybe to group all other perspectives' as "other" or something instead of just dropping them? Does this make sense to you? When there is no easy way I can apply the filter to admin|dev for now.

This allows us to protect against future label value proliferation, to reason about the number of time-series this PR adds and it forces you to actually reason about what data you need.

Also fair warning: This looks like it'll add quite a few time-series, so we'll like have to get @eparis's thumbs up. No big deal, just another admin step. grinning

Sure, that's fine.

@christoph-jerolimov
Copy link
Member Author

/retest

@simonpasquier
Copy link
Contributor

/skip

@simonpasquier
Copy link
Contributor

/retest

@simonpasquier
Copy link
Contributor

/retest-required

@christoph-jerolimov
Copy link
Member Author

/retest

1 similar comment
@christoph-jerolimov
Copy link
Member Author

/retest

@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

5 similar comments
@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

@christoph-jerolimov
Copy link
Member Author

/test e2e-agnostic-operator

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 24, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerolimov, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@simonpasquier
Copy link
Contributor

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3d60d38 and 2 for PR HEAD e7c08b9 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2023

@jerolimov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/versions e7c08b9 link false /test versions
ci/prow/e2e-aws-ovn-single-node e7c08b9 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3338b59 and 1 for PR HEAD e7c08b9 in total

@openshift-merge-robot openshift-merge-robot merged commit bc36e20 into openshift:master May 26, 2023
@openshift-ci-robot
Copy link
Contributor

@jerolimov: Jira Issue OCPBUGS-12903: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-12903 has been moved to the MODIFIED state.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Fixes: ODC-7258

This PR adds new console metrics that are collected in ODC-7232 and should make them later available in Superset DataHat or Tableau.

See also PR openshift/console#12527 for detailed information about the collected metrics.

Update: PR openshift/console#12684 reduced the metrics cardinality based on the feedback in this PR here.

A quick overview:

  1. GaugeVec cluster_version_capability with label name is already part of telemetry, and could be used in internal Superset or Tableau, or ❓
  2. Sum Counter console_auth_login_requests_total
  3. Sum CounterVec console_auth_login_successes_total by label role
  4. Sum CounterVec console_auth_login_failures_total by label reason
  5. Sum CounterVec console_auth_logout_requests_total by label reason
  6. Sum CounterVec console_usage_total by labels event and perspective
    ⚠️ Update: Removed this metric completely from this PR.
  7. Max GaugeVec console_usage_users by label role
    Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 only used roles was reported.
  8. Max GaugeVec console_plugins_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the plugins are grouped by their name to reduce cardinality.
    Name could be just "redhat", "demo" and "other" at the moment
  9. Max GaugeVec console_customization_perspectives_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the perspectives are grouped as well.
    Name could be just "admin", "dev", "acm" and "other"

An example of what the console /metrics endpoint provides:

# 1
# HELP console_auth_login_requests_total Total number of login requests from the frontend.
# TYPE console_auth_login_requests_total counter
console_auth_login_requests_total 27

# 2
# HELP console_auth_login_successes_total Total number of successful logins. Role label is based on RBAC can list namespaces check.
# TYPE console_auth_login_successes_total counter
console_auth_login_successes_total{role="cluster-admin"} 3
console_auth_login_successes_total{role="developer"} 23
console_auth_login_successes_total{role="kubeadmin"} 1

# 3
# HELP console_auth_login_failures_total Total number of login failures.
# TYPE console_auth_login_failures_total counter
console_auth_login_failures_total{reason="unknown"} 0

# 4
# HELP console_auth_logout_requests_total Total number of logout requests from the frontend.
# TYPE console_auth_logout_requests_total counter
console_auth_logout_requests_total{reason="unknown"} 0

# 5 --- NOT PART OF THIS PR ANYMORE --- keep that here to give the code review comments a context
#
# HELP console_usage_total Total number of events like "page_views" (loading index.html without history.push) and "page_impressions".
# TYPE console_usage_total counter
# console_usage_total{event="page_impression",perspective="admin"} 13
# console_usage_total{event="page_impression",perspective="dev"} 10
# console_usage_total{event="page_view",perspective="admin"} 212
# console_usage_total{event="page_view",perspective="dev"} 432

# 6
# HELP console_usage_users The number of console users splitten into roles (cluster-admin, developer, and unknown if a RBAC check fails)
# TYPE console_usage_users gauge
console_usage_users{role="cluster-admin"} 3
console_usage_users{role="developer"} 32
console_usage_users{role="kubeadmin"} 1

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
#
# OLD:
# console_plugins_info{name="acm",state="enabled"} 1
# console_plugins_info{name="crane-ui-plugin",state="enabled"} 1
# console_plugins_info{name="logging-view-plugin",state="enabled"} 1
# console_plugins_info{name="mce",state="enabled"} 1
# console_plugins_info{name="my-plugin",state="enabled"} 1
#
# NEW:
console_plugins_info{name="redhat",state="notfound"} 4
console_plugins_info{name="demo",state="notfound"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
#
# OLD:
# console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev1",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev2",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev3",state="only-for-developers"} 1
#
# NEW:
console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
console_customization_perspectives_info{name="other",state="only-for-developers"} 3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@jerolimov: Jira Issue OCPBUGS-12903 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Fixes: OCPBUGS-12903

This PR adds new console metrics that are collected in ODC-7232 and should make them later available in Superset DataHat or Tableau.

See also PR openshift/console#12527 for detailed information about the collected metrics.

Update: PR openshift/console#12684 reduced the metrics cardinality based on the feedback in this PR here.

A quick overview:

  1. GaugeVec cluster_version_capability with label name is already part of telemetry, and could be used in internal Superset or Tableau, or ❓
  2. Sum Counter console_auth_login_requests_total
  3. Sum CounterVec console_auth_login_successes_total by label role
  4. Sum CounterVec console_auth_login_failures_total by label reason
  5. Sum CounterVec console_auth_logout_requests_total by label reason
  6. Sum CounterVec console_usage_total by labels event and perspective
    ⚠️ Update: Removed this metric completely from this PR.
  7. Max GaugeVec console_usage_users by label role
    Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 only used roles was reported.
  8. Max GaugeVec console_plugins_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the plugins are grouped by their name to reduce cardinality.
    Name could be just "redhat", "demo" and "other" at the moment
  9. Max GaugeVec console_customization_perspectives_info by labels name and state
    ⚠️ Update: with PR OCPBUGS-10956: Reduce metrics cardinality by grouping well-known and other perspectives and plugins console#12684 the perspectives are grouped as well.
    Name could be just "admin", "dev", "acm" and "other"

An example of what the console /metrics endpoint provides:

# 1
# HELP console_auth_login_requests_total Total number of login requests from the frontend.
# TYPE console_auth_login_requests_total counter
console_auth_login_requests_total 27

# 2
# HELP console_auth_login_successes_total Total number of successful logins. Role label is based on RBAC can list namespaces check.
# TYPE console_auth_login_successes_total counter
console_auth_login_successes_total{role="cluster-admin"} 3
console_auth_login_successes_total{role="developer"} 23
console_auth_login_successes_total{role="kubeadmin"} 1

# 3
# HELP console_auth_login_failures_total Total number of login failures.
# TYPE console_auth_login_failures_total counter
console_auth_login_failures_total{reason="unknown"} 0

# 4
# HELP console_auth_logout_requests_total Total number of logout requests from the frontend.
# TYPE console_auth_logout_requests_total counter
console_auth_logout_requests_total{reason="unknown"} 0

# 5 --- NOT PART OF THIS PR ANYMORE --- keep that here to give the code review comments a context
#
# HELP console_usage_total Total number of events like "page_views" (loading index.html without history.push) and "page_impressions".
# TYPE console_usage_total counter
# console_usage_total{event="page_impression",perspective="admin"} 13
# console_usage_total{event="page_impression",perspective="dev"} 10
# console_usage_total{event="page_view",perspective="admin"} 212
# console_usage_total{event="page_view",perspective="dev"} 432

# 6
# HELP console_usage_users The number of console users splitten into roles (cluster-admin, developer, and unknown if a RBAC check fails)
# TYPE console_usage_users gauge
console_usage_users{role="cluster-admin"} 3
console_usage_users{role="developer"} 32
console_usage_users{role="kubeadmin"} 1

# 7
# HELP console_plugins_info List all plugins with their name and state as label. State is currently always enabled. Reports 1 for each plugin (per console pod instance).
# TYPE console_plugins_info gauge
#
# OLD:
# console_plugins_info{name="acm",state="enabled"} 1
# console_plugins_info{name="crane-ui-plugin",state="enabled"} 1
# console_plugins_info{name="logging-view-plugin",state="enabled"} 1
# console_plugins_info{name="mce",state="enabled"} 1
# console_plugins_info{name="my-plugin",state="enabled"} 1
#
# NEW:
console_plugins_info{name="redhat",state="notfound"} 4
console_plugins_info{name="demo",state="notfound"} 1

# 8
# HELP console_customization_perspectives_info List of customized perspectives, for example perspective=dev with state=disabled and 1 as metric.
# TYPE console_customization_perspectives_info gauge
#
# OLD:
# console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev1",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev2",state="only-for-developers"} 1
# console_customization_perspectives_info{name="dev3",state="only-for-developers"} 1
#
# NEW:
console_customization_perspectives_info{name="dev",state="only-for-developers"} 1
console_customization_perspectives_info{name="other",state="only-for-developers"} 3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants