Skip to content

[DOCS] Defines data frame transform stats API objects #44197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 5, 2020

Conversation

lcawl
Copy link
Contributor

@lcawl lcawl commented Jul 11, 2019

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195
Copy link
Contributor

Due to #43767 the format of stats will change between 7.2 and 7.3. Therefore it might be best just to document the new format post #43767 and not to backport this PR to the 7.2 branch as current labels suggest.

@hendrikmuhs
Copy link

@lcawl Can this PR be closed? Looks outdated to me.

@lcawl lcawl removed the v7.3.3 label Jan 28, 2020
@lcawl
Copy link
Contributor Author

lcawl commented Jan 28, 2020

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

merge conflict between base and head

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this again.

I commented the TBD parts and added some explanations.

`checkpointing`.`last`:::
(object) Contains statistics about the last completed checkpoint.
`checkpointing`.`last`.`checkpoint`::::
(TBD) A unique identifier for the checkpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest: "sequence number for the checkpoint")

`checkpointing`.`last`.`checkpoint`::::
(TBD) A unique identifier for the checkpoint.
`checkpointing`.`last`.`time_upper_bound_millis`::::
(date) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, timestamp until data has been processed when using time-based synchronization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... timestamp until data has been processed...

Thanks for the feedback @hendrikmuhs ! I'm not sure I understand this description yet, however. Is it the duration of the checkpoint?

Copy link

@hendrikmuhs hendrikmuhs Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of a continuous transform, where you source indexing gets new data in, so the destination/transformed index runs always behind the source. time_upper_bound marks the timestamp until all data from source has been processed into dest. So it's not a duration its an endmarker until data has been processed.

There is also timestamp and it might seem like the same thing, but timestamp is the time the checkpoint has been created, time_upper_bound has to take the delay into account. So normally time_upper_bound = timestamp - delay. (However in future this might change, that's why timestamp and time_upper_bound are separate fields)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've drafted changes to those two descriptions. If they still need tweaking, please just let me know.

`checkpointing`.`last`.`time_upper_bound_millis`::::
(date) TBD
`checkpointing`.`last`.`timestamp_millis`::::
(date) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp of the checkpoint (when the checkpoint has been created)

(date) TBD
`checkpointing`.`last`.`timestamp_millis`::::
(date) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpointing.next`:::
optional (object) Contains statistics about the next - currently in progress - checkpoint. This object only appears if the transform is currently processing data and only for the 1st checkpoint

It uses the same fields at last but has one more object:

checkpoint_progress::
(object) Contains statistics about the progress of the checkpoint.

Not sure how much we want to go into the details, the inner fields are:

  • total_docs
  • docs_remaining
  • percent_complete
  • docs_processed
  • docs_indexed

This information is only available for batch transforms and for the 1st checkpoint of a continuous transform.

* `indexing`: The {transform} is actively processing data and creating new
documents.
* `started`: The {transform} is running but not actively indexing data.
* `stopped`: The {transform} is stopped.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • aborting The {transform} is aborting.
  • stopping The {transform} is stopping.
  • failed The {transform} has failed. Check the reason field for further information.

`stats`::
(object) An object that provides statistical information about the {transform}.
`stats`.`documents_indexed`:::
(TBD) The number of new documents that have been indexed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of documents that have been indexed into the transform dest index.

`stats`.`documents_indexed`:::
(TBD) The number of new documents that have been indexed.
`stats`.`documents_processed`:::
(TBD) The number of documents that have been processed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of documents that have been processed from the transform source index.

`stats`.`index_total`:::
(long) The number of indices created.
`stats`.`pages_processed`:::
(TBD) The number of pages processed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(long) The number of pages (number of search/bulk index operations) processed.

(I do not know if this need better explanation: In a nutshell documents are not processed one by one but always on batches. This happens both for search - a search page - as well as for indexing. There a "page" describes 1 bulk index operation that consists of a list of documents to be indexed.)

`stats`.`search_time_in_ms`:::
(long) The amount of time spent searching, in milliseconds.
`stats`.`search_total`:::
(long) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of search operations on the transform source index.

`stats`.`search_total`:::
(long) TBD
`stats`.`trigger_count`:::
(TBD) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of times the transform has been triggered by the scheduler.

(The scheduler triggers the transform indexer to e.g. check for updates / ingest new data, this can be controlled via the frequency parameter in the config: https://www.elastic.co/guide/en/elasticsearch/reference/master/put-transform.html#put-transform-request-body)

@lcawl lcawl marked this pull request as ready for review February 4, 2020 17:13
@lcawl lcawl removed the WIP label Feb 4, 2020
Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great, I still suggest to remove some technical detail, those were only for you explaining how transform works internally.

`checkpointing`.`next`:::
(object) Contains statistics about the next checkpoint that is currently in
progress. This object appears only if the {transform} is currently processing
data and only for the first checkpoint.
Copy link

@hendrikmuhs hendrikmuhs Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this is not quite correct yet: "and only for the first checkpoint" is only true for the checkpoint_progress nested object below.

checkpointing.next will always be there if the transform is actively doing something (when the state is indexing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications! I've pushed another commit

(date) When using time-based synchronization, this timestamp indicates the
upper bound of data that is included in the checkpoint. Typically, this value
is equal to the `checkpointing`.`last`.`time_upper_bound_millis` minus the
`sync`.`time`.`delay`, which is defined when you create the {transform}.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not include "Typically, this value is ..." This was just for your information, not meant to be put here.

(object) Contains statistics about the progress of the checkpoint. For example,
it lists the `total_docs`, `docs_remaining`, `percent_complete`,
`docs_processed`, and `docs_indexed`. This information is available only for
batch {transforms} and the first checkpoint of {ctransforms}.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

batch {transforms} and the first checkpoint of {ctransforms}.
`checkpointing`.`next`.`time_upper_bound_millis`::::
(date) When using time-based synchronization, this timestamp indicates the
upper bound of data that is included in the checkpoint. Typically, this value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit "Typically..."

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants