Skip to content

[ML] Create and inject APM Inference Metrics #111293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 29, 2024

Conversation

prwhelan
Copy link
Member

We are migrating from in-memory cumulative counter to an Time Series Data Stream delta counter. The goal is to avoid metrics suddenly dropping to zero when a node restarts, hopefully increasing accuracy of the metric.

Verified (with APM injected) that we are receiving APM stats:

{
  "_index": ".ds-metrics-apm.app.elasticsearch-default-2024.07.22-000001",
    ...
    "data_stream": {
      "dataset": "apm.app.elasticsearch",
      "namespace": "default",
      "type": "metrics"
    },
    "es": {
      "inference": {
        "requests": {
          "count": {
            "total": 8
          }
        }
      }
    },
    ...
    "labels": {
      "model_id": ".elser_model_2_linux-x86_64",
      "otel_instrumentation_scope_name": "elasticsearch",
      "service": "elser",
      "task_type": "sparse_embedding"
    },
    ...
  }
}

We are migrating from in-memory cumulative counter to an Time Series
Data Stream delta counter.  The goal is to avoid metrics suddenly
dropping to zero when a node restarts, hopefully increasing accuracy of
the metric.
@prwhelan prwhelan added >non-issue :ml Machine learning Team:ML Meta label for the ML team labels Jul 25, 2024
@prwhelan prwhelan changed the title [ML] Create and inject NoOp Inference Metrics [ML] Create and inject APM Inference Metrics Jul 26, 2024
@prwhelan prwhelan marked this pull request as ready for review July 26, 2024 18:35
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


var stats = new ApmInferenceStats(longCounter);

stats.incrementRequestCount(model("service", TaskType.ANY, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of recording stats for null models?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why - modelId is currently marketed optional/nullable so this is just a bit of protection to handle it. I figured we'd be able to see "service" level stats and go track down where it is null, if anywhere

@prwhelan prwhelan merged commit 1a939e9 into elastic:main Jul 29, 2024
15 checks passed
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 30, 2024
* upstream/main: (105 commits)
  Removing the use of watcher stats from WatchAcTests (elastic#111435)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=UPGRADED} elastic#111434
  Make `EnrichPolicyRunner` more properly async (elastic#111321)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=OLD} elastic#111430
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode KEYWORDs>} elastic#111428
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode TEXTs>} elastic#111429
  Mute org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT elastic#111307
  Update semantic_text field to support indexing numeric and boolean data types (elastic#111284)
  Mute org.elasticsearch.repositories.blobstore.testkit.AzureSnapshotRepoTestKitIT testRepositoryAnalysis elastic#111280
  Ensure vector similarity correctly limits inner_hits returned for nested kNN (elastic#111363)
  Fix LogsIndexModeFullClusterRestartIT (elastic#111362)
  Remove 4096 bool query max limit from docs (elastic#111421)
  Fix score count validation in reranker response (elastic#111212)
  Integrate data generator in LogsDB mode challenge test (elastic#111303)
  ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (elastic#111409)
  [Service Account] Add AutoOps account (elastic#111316)
  [ML] Fix failing test DetectionRulesTests.testEqualsAndHashcode (elastic#111351)
  [ML] Create and inject APM Inference Metrics (elastic#111293)
  [DOCS] Additional reranking docs updates (elastic#111350)
  Mute org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT elastic#111345
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants