Skip to content

[ML] adding running_state to datafeed stats object #73926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2021

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Jun 8, 2021

It is useful to know the following information when reading datafeed stats:

  • Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured end time
  • Has the datafeed processed all past data available at the time of starting.

This object is only available if the datafeed task has been created.

It has the form:

"running_state": {
  "real_time_configured": <boolean>,
  "real_time_running": <boolean>
}

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent benwtrent force-pushed the feature/ml-is-df-realtime branch from adcaf29 to ff479e7 Compare June 8, 2021 19:03
has no configured `end` time.

`finished_look_back`:::
(boolean) Has the {dfeed} finished running on the available past data. For {dfeeds}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(boolean) Has the {dfeed} finished running on the available past data. For {dfeeds}
(boolean) Indicates whether the {dfeed} has finished running on the available past data. For {dfeeds}


`finished_look_back`:::
(boolean) Has the {dfeed} finished running on the available past data. For {dfeeds}
that without a configured `end` time, this means that the {dfeed} is now running on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that without a configured `end` time, this means that the {dfeed} is now running on
without a configured `end` time, this means that the {dfeed} is now running on

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

My only issue is whether we introduce the term "lookback" to the world. I am leaning towards not doing this.

(boolean) Indicates if the {dfeed} is "real-time"; meaning that the {dfeed}
has no configured `end` time.

`finished_look_back`:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our existing docs don't use the term "lookback". I am not sure we should introduce it now.

One solution would be to rename is_real_time to real_time_configured and finished_look_back to real_time_running. Then that doesn't involve introducing a new public term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195

While its true that we don't use it in docs, lookback is used in audit messages:

    public static final String JOB_AUDIT_DATAFEED_LOOKBACK_COMPLETED = "Datafeed lookback completed";
    public static final String JOB_AUDIT_DATAFEED_LOOKBACK_NO_DATA = "Datafeed lookback retrieved no data";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to change to both to indicate real_time. But we do use lookback in user facing things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookback is used in audit messages

Oh, interesting. I think we made the decision in 2016 not to use it anywhere user-facing, but those audit messages were added in 2017, by which time the 2016 decision had been forgotten. I guess the lesson is that we should change our terminology in internal code as well as what's immediately user-facing to stop internal terminology leaking out later on.

I still think there is a benefit in not propagating the term to fields that will appear in every high level client's public API. At the moment we could change the wording of those audit messages without making a breaking change, but the REST responses are harder to change once published.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey dokey, I will rename the variables.

I guess the lesson is that we should change our terminology in internal code as well as what's immediately user-facing to stop internal terminology leaking out later on.

100%. Renaming things everywhere prevents this sort of thing.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent requested a review from lcawl June 9, 2021 17:12
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent benwtrent merged commit 8d88286 into elastic:master Jun 10, 2021
@benwtrent benwtrent deleted the feature/ml-is-df-realtime branch June 10, 2021 12:08
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jun 10, 2021
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jun 10, 2021
@droberts195 droberts195 added the Team:Clients Meta label for clients team label Jun 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

benwtrent added a commit that referenced this pull request Jun 10, 2021
It is useful to know the following information when reading datafeed stats:

 - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
 - Has the datafeed processed all past data available at the time of starting.

This object is only available if the datafeed task has been created.

It has the form:

```
"running_state": {
  "is_real_time": <boolean>,
  "look_back_finished": <boolean>
}
```
benwtrent added a commit that referenced this pull request Jun 10, 2021
benwtrent added a commit that referenced this pull request Jun 14, 2021
…74025)

Adds the new running_state field to the High Level REST client.

The new field is introduced in 7.14 through PR: #73926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:Clients Meta label for clients team Team:ML Meta label for the ML team v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants