Skip to content

[DOCS] Defines data frame transform stats API objects #44197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 5, 2020
78 changes: 75 additions & 3 deletions docs/reference/transform/apis/get-transform-stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,81 @@ include::{docdir}/rest-api/common-parms.asciidoc[tag=size-transforms]
[[get-transform-stats-response]]
==== {api-response-body-title}

`transforms`::
(array) An array of statistics objects for {transforms}, which are
sorted by the `id` value in ascending order.
The API returns an array of statistics objects for {transforms}, which are
sorted by the `id` value in ascending order. All of these properties are
informational; you cannot update their values.

`checkpointing`::
(object) Contains statistics about <<transform-checkpoints,checkpoints>>.
`checkpointing`.`changes_last_detected_at`:::
(date) The timestamp when changes were last detected in the source indices.
`checkpointing`.`last`:::
(object) Contains statistics about the last completed checkpoint.
`checkpointing`.`last`.`checkpoint`::::
(TBD) A unique identifier for the checkpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest: "sequence number for the checkpoint")

`checkpointing`.`last`.`time_upper_bound_millis`::::
(date) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, timestamp until data has been processed when using time-based synchronization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... timestamp until data has been processed...

Thanks for the feedback @hendrikmuhs ! I'm not sure I understand this description yet, however. Is it the duration of the checkpoint?

Copy link

@hendrikmuhs hendrikmuhs Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of a continuous transform, where you source indexing gets new data in, so the destination/transformed index runs always behind the source. time_upper_bound marks the timestamp until all data from source has been processed into dest. So it's not a duration its an endmarker until data has been processed.

There is also timestamp and it might seem like the same thing, but timestamp is the time the checkpoint has been created, time_upper_bound has to take the delay into account. So normally time_upper_bound = timestamp - delay. (However in future this might change, that's why timestamp and time_upper_bound are separate fields)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've drafted changes to those two descriptions. If they still need tweaking, please just let me know.

`checkpointing`.`last`.`timestamp_millis`::::
(date) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp of the checkpoint (when the checkpoint has been created)


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpointing.next`:::
optional (object) Contains statistics about the next - currently in progress - checkpoint. This object only appears if the transform is currently processing data and only for the 1st checkpoint

It uses the same fields at last but has one more object:

checkpoint_progress::
(object) Contains statistics about the progress of the checkpoint.

Not sure how much we want to go into the details, the inner fields are:

  • total_docs
  • docs_remaining
  • percent_complete
  • docs_processed
  • docs_indexed

This information is only available for batch transforms and for the 1st checkpoint of a continuous transform.

`id`::
(string)
include::{docdir}/rest-api/common-parms.asciidoc[tag=transform-id]

`node`::
(object) For started {transforms} only, the node upon which the {transform} is
started.
`node`.`attributes`:::
(object) A list of attributes for the node.
`node`.`ephemeral_id`:::
(string) The node ephemeral ID.
`node`.`id`:::
(string) The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
`node`.`name`:::
(string) The node name. For example, `0-o0tOo`.
`node`.`transport_address`:::
(string) The host and port where transport HTTP connections are accepted. For
example, `127.0.0.1:9300`.
`state`::
(string) The status of the {transform}, which can be one of the following values:
+
--
* `indexing`: The {transform} is actively processing data and creating new
documents.
* `started`: The {transform} is running but not actively indexing data.
* `stopped`: The {transform} is stopped.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • aborting The {transform} is aborting.
  • stopping The {transform} is stopping.
  • failed The {transform} has failed. Check the reason field for further information.

--

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • reason::
    (string) The reason of the failure if the transform is in failed state.


`stats`::
(object) An object that provides statistical information about the {transform}.
`stats`.`documents_indexed`:::
(TBD) The number of new documents that have been indexed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of documents that have been indexed into the transform dest index.

`stats`.`documents_processed`:::
(TBD) The number of documents that have been processed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of documents that have been processed from the transform source index.

`stats`.`exponential_avg_checkpoint_duration_ms`:::
(double) Exponential moving average of the duration of the checkpoint, in milliseconds.
`stats`.`exponential_avg_documents_indexed`:::
(double) Exponential moving average of the number of new documents that have been
indexed.
`stats`.`exponential_avg_documents_processed`:::
(double) Exponential moving average of the number of documents that have been
processed.
`stats`.`index_failures`:::
(long) The number of indexing failures.
`stats`.`index_time_in_ms`:::
(long) The amount of time spent indexing, in milliseconds.
`stats`.`index_total`:::
(long) The number of indices created.
`stats`.`pages_processed`:::
(TBD) The number of pages processed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(long) The number of pages (number of search/bulk index operations) processed.

(I do not know if this need better explanation: In a nutshell documents are not processed one by one but always on batches. This happens both for search - a search page - as well as for indexing. There a "page" describes 1 bulk index operation that consists of a list of documents to be indexed.)

`stats`.`search_failures`:::
(long) The number of search failures.
`stats`.`search_time_in_ms`:::
(long) The amount of time spent searching, in milliseconds.
`stats`.`search_total`:::
(long) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of search operations on the transform source index.

`stats`.`trigger_count`:::
(TBD) TBD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of times the transform has been triggered by the scheduler.

(The scheduler triggers the transform indexer to e.g. check for updates / ingest new data, this can be controlled via the frequency parameter in the config: https://www.elastic.co/guide/en/elasticsearch/reference/master/put-transform.html#put-transform-request-body)


[[get-transform-stats-response-codes]]
==== {api-response-codes-title}
Expand Down