Skip to content

Rally benchmark #1522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Nov 7, 2023
Merged

Rally benchmark #1522

merged 49 commits into from
Nov 7, 2023

Conversation

aspacca
Copy link
Contributor

@aspacca aspacca commented Oct 24, 2023

adding command to run rally benchmark

  • some repetition of the code between system and rally: it is no 100% the same, not sure if trying to factorize it and move to common
  • require esrally installed on the host, available from $PATH
  • missing docs
$ pwd
elastic-package/test/packages/benchmarks/rally_benchmark
$ elastic-package benchmark rally --benchmark logs-benchmark
Run rally benchmarks for the package
--- Benchmark results for package: rally_benchmarks - START ---
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ info                                                                                             │
├────────────────────────┬─────────────────────────────────────────────────────────────────────────┤
│ benchmark              │                                                          logs-benchmark │
│ description            │                                         Benchmark 20000 events ingested │
│ run ID                 │                                    85c5f94b-d53a-4b76-b897-25a5e8df6f22 │
│ package                │                                                        rally_benchmarks │
│ start ts (s)           │                                                              1698117981 │
│ end ts (s)             │                                                              1698118001 │
│ duration               │                                                                     20s │
│ generated corpora file │                   ~/.elastic-package/tmp/rally_corpus/corpus-1222483366 │
╰────────────────────────┴─────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────╮
│ parameters                                                         │
├─────────────────────────────────┬──────────────────────────────────┤
│ package version                 │                      999.999.999 │
│ input                           │                       filestream │
│ data_stream.name                │                           testds │
│ data_stream.vars.paths          │                     [dummy path] │
│ warmup time period              │                              10s │
│ corpora.generator.total_events  │                            20000 │
│ corpora.generator.template.path │ ./logs-benchmark/template.ndjson │
│ corpora.generator.template.raw  │                                  │
│ corpora.generator.template.type │                           gotext │
│ corpora.generator.config.path   │      ./logs-benchmark/config.yml │
│ corpora.generator.config.raw    │                            map[] │
│ corpora.generator.fields.path   │      ./logs-benchmark/fields.yml │
│ corpora.generator.fields.raw    │                            map[] │
╰─────────────────────────────────┴──────────────────────────────────╯
╭───────────────────────╮
│ cluster info          │
├───────┬───────────────┤
│ name  │ elasticsearch │
│ nodes │             1 │
╰───────┴───────────────╯
╭──────────────────────────────────────────────────────────────╮
│ data stream stats                                            │
├────────────────────────────┬─────────────────────────────────┤
│ data stream                │ logs-rally_benchmarks.testds-ep │
│ approx total docs ingested │                               0 │
│ backing indices            │                               1 │
│ store size bytes           │                             269 │
│ maximum ts (ms)            │                               0 │
╰────────────────────────────┴─────────────────────────────────╯
╭────────────────────────────────────╮
│ disk usage for index .ds-logs-rall │
│ y_benchmarks.testds-ep-2023.10.23- │
│ 000001 (for all fields)            │
├──────────────────────────────┬─────┤
│ total                        │ 0 B │
│ inverted_index.total         │ 0 B │
│ inverted_index.stored_fields │ 0 B │
│ inverted_index.doc_values    │ 0 B │
│ inverted_index.points        │ 0 B │
│ inverted_index.norms         │ 0 B │
│ inverted_index.term_vectors  │ 0 B │
│ inverted_index.knn_vectors   │ 0 B │
╰──────────────────────────────┴─────╯
╭─────────────────────────────────────────────────────────────────────────────────────────╮
│ pipeline logs-rally_benchmarks.testds-999.999.999 stats in node 0VW3SV6bRRue_LPWFWiOtQ  │
├────────────────────────────────────────────────┬────────────────────────────────────────┤
│ Totals                                         │ Count: 20000 | Failed: 0 | Time: 498ms │
│ grok ()                                        │ Count: 20000 | Failed: 0 | Time: 455ms │
│ user_agent ()                                  │  Count: 20000 | Failed: 0 | Time: 24ms │
│ pipeline (logs-rally_benchmarks.testds@custom) │   Count: 20000 | Failed: 0 | Time: 3ms │
╰────────────────────────────────────────────────┴────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────────────────╮
│ rally stats                                                                                │
├────────────────────────────────────────────────────────────────┬───────────────────────────┤
│ Cumulative indexing time of primary shards                     │     2.374416666666667 min │
│ Min cumulative indexing time across primary shards             │                     0 min │
│ Median cumulative indexing time across primary shards          │   0.02196666666666667 min │
│ Max cumulative indexing time across primary shards             │   0.35408333333333336 min │
│ Cumulative indexing throttle time of primary shards            │                     0 min │
│ Min cumulative indexing throttle time across primary shards    │                     0 min │
│ Median cumulative indexing throttle time across primary shards │                     0 min │
│ Max cumulative indexing throttle time across primary shards    │                     0 min │
│ Cumulative merge time of primary shards                        │    0.3455833333333333 min │
│ Cumulative merge count of primary shards                       │                       577 │
│ Min cumulative merge time across primary shards                │                     0 min │
│ Median cumulative merge time across primary shards             │ 0.0030833333333333333 min │
│ Max cumulative merge time across primary shards                │   0.06146666666666667 min │
│ Cumulative merge throttle time of primary shards               │                     0 min │
│ Min cumulative merge throttle time across primary shards       │                     0 min │
│ Median cumulative merge throttle time across primary shards    │                     0 min │
│ Max cumulative merge throttle time across primary shards       │                     0 min │
│ Cumulative refresh time of primary shards                      │    0.2704333333333333 min │
│ Cumulative refresh count of primary shards                     │                     18634 │
│ Min cumulative refresh time across primary shards              │                     0 min │
│ Median cumulative refresh time across primary shards           │  0.005966666666666666 min │
│ Max cumulative refresh time across primary shards              │  0.018783333333333332 min │
│ Cumulative flush time of primary shards                        │    11.530216666666666 min │
│ Cumulative flush count of primary shards                       │                     18178 │
│ Min cumulative flush time across primary shards                │ 6.666666666666667e-05 min │
│ Median cumulative flush time across primary shards             │   0.16653333333333334 min │
│ Max cumulative flush time across primary shards                │    0.7956666666666667 min │
│ Total Young Gen GC time                                        │                   0.017 s │
│ Total Young Gen GC count                                       │                         3 │
│ Total Old Gen GC time                                          │                       0 s │
│ Total Old Gen GC count                                         │                         0 │
│ Store size                                                     │    0.06993633136153221 GB │
│ Translog size                                                  │  0.0001253066584467888 GB │
│ Heap used for segments                                         │                      0 MB │
│ Heap used for doc values                                       │                      0 MB │
│ Heap used for terms                                            │                      0 MB │
│ Heap used for norms                                            │                      0 MB │
│ Heap used for points                                           │                      0 MB │
│ Heap used for stored fields                                    │                      0 MB │
│ Segment count                                                  │                       497 │
│ Total Ingest Pipeline count                                    │                     20026 │
│ Total Ingest Pipeline time                                     │                   0.741 s │
│ Total Ingest Pipeline failed                                   │                         0 │
│ Min Throughput                                                 │           62131.30 docs/s │
│ Mean Throughput                                                │           62131.30 docs/s │
│ Median Throughput                                              │           62131.30 docs/s │
│ Max Throughput                                                 │           62131.30 docs/s │
│ 50th percentile latency                                        │     253.73077099999898 ms │
│ 100th percentile latency                                       │     269.62587499999916 ms │
│ 50th percentile service time                                   │     253.73077099999898 ms │
│ 100th percentile service time                                  │     269.62587499999916 ms │
│ error rate                                                     │                  100.00 % │
╰────────────────────────────────────────────────────────────────┴───────────────────────────╯

--- Benchmark results for package: rally_benchmarks - END   ---
Done

@marc-gr: I have a doubt about warm-up period. it is run in a goroutine in the setup method, rally will be executed before that. I'm not sure how does it works for system benchmark. what is it supposed to achieve the warm up period?

@aspacca aspacca requested review from jsoriano, mrodm and marc-gr October 24, 2023 01:29
@aspacca aspacca self-assigned this Oct 24, 2023
@aspacca aspacca mentioned this pull request Oct 24, 2023
@aspacca
Copy link
Contributor Author

aspacca commented Oct 24, 2023

closes #1475


These benchmarks allow you to benchmark an integration corpus with rally.

For details on how to configure rally benchmarks for a package, review the [HOWTO guide](./docs/howto/rally_benchmarking.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this link correct? Trying to find it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link is correct. but I haven't yet written the docs :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these to the PR? I was looking for these as I hit some issues testing the PR. Will comment on more on the issue I hit soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's planned to add to this PR: in the description I listed missing docs :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give it a try when there are docs 🙂

@ruflin
Copy link
Contributor

ruflin commented Oct 30, 2023

I'm running these commands to do some testing. Few findings:

  • elastic-package build fails with package with GA version (999.999.999) is using an unreleased version of the spec (3.0.1-next) (PSR00001). Interestingly enough the rally benchmark runs. Did it install the package?
  • Keep generated rally track: Is there an option that I can keep the generated rally track file? I checked the directory that was referenced but it seems it is empty after the run. I assume it cleans it up? It would be nice to be able to generate only the rally track so others can run it later or we can run it multiple times for comparison. This would also help to review the rally track itself. I see you have "defer-cleanup" but that only seems temporary?
  • esrally installation: If rally is not there, can we ask users to run pip3 install esrally or point them to the docs directly?

@aspacca
Copy link
Contributor Author

aspacca commented Oct 30, 2023

  • elastic-package build fails with package with GA version (999.999.999) is using an unreleased version of the spec (3.0.1-next) (PSR00001). Interestingly enough the rally benchmark runs. Did it install the package?

I was not aware of this: during my test I used go run main.go
Before merging the PR the version of the spec will be 3.0.1 (most likely), ie: a released version of the spec

  • Keep generated rally track: Is there an option that I can keep the generated rally track file? I checked the directory that was referenced but it seems it is empty after the run. I assume it cleans it up? It would be nice to be able to generate only the rally track so others can run it later or we can run it multiple times for comparison. This would also help to review the rally track itself. I see you have "defer-cleanup" but that only seems temporary?

We have a separated command for generating rally track without running the benchmark: elastic-package benchmark generate-corpus --rally-track-output-dir. currently this command still uses the generator assets from the generator repo, but I was planning to refactor it to use the assets in the package. maybe it's better to drop at all the command and add an option to save the track/do not run rally directly in the new command of this PR

  • esrally installation: If rally is not there, can we ask users to run pip3 install esrally or point them to the docs directly?

👍

@ruflin
Copy link
Contributor

ruflin commented Oct 30, 2023

We have a separated command for generating rally track without running the benchmark: elastic-package benchmark generate-corpus --rally-track-output-dir. currently this command still uses the generator assets from the generator repo, but I was planning to refactor it to use the assets in the package. maybe it's better to drop at all the command and add an option to save the track/do not run rally directly in the new command of this PR

As a first step, lets make sure it all uses the assets from the package. I agree, unifying the commands might make sense but we could also solve it with docs pointing to it. Key is that the outcome is the same, meaning if I run the rally track "live" or generate the corpus, same data is in.

@marc-gr
Copy link
Contributor

marc-gr commented Oct 30, 2023

@marc-gr: I have a doubt about warm-up period. it is run in a goroutine in the setup method, rally will be executed before that. I'm not sure how does it works for system benchmark. what is it supposed to achieve the warm up period?

In your case I think this comes builtin with rally itself, the intention of this is to defer metric collection until the warm up period ends, so that time is not going to be taken into account for reporting.

@aspacca
Copy link
Contributor Author

aspacca commented Oct 30, 2023

As a first step, lets make sure it all uses the assets from the package. I agree, unifying the commands might make sense but we could also solve it with docs pointing to it.

it was in end faster to add saving the rally track and a dry run in this PR than refactor the previous command: I will remove it in a next PR

Key is that the outcome is the same, meaning if I run the rally track "live" or generate the corpus, same data is in.

it will not be exactly the same data: meaning that if I generate the corpus multiple time (running rally) the data will have the same "shape" (cardinality, range, etc), but different randomized value because seeds and time values are based on "now".

the generator tool already has the option to pass a seed and "now" from command line, let me know if you want to add to the command in elastic-package as well

@aspacca
Copy link
Contributor Author

aspacca commented Oct 30, 2023

In your case I think this comes builtin with rally itself, the intention of this is to defer metric collection until the warm up period ends, so that time is not going to be taken into account for reporting.

ok, it's a warm up period for metrics collection, that makes sense as it is now then.
I thought it was a warmup for starting the benchmark :)

@aspacca
Copy link
Contributor Author

aspacca commented Nov 2, 2023

@ruflin I now refresh the index before collecting stats
also I removed polling the hits while rally is running, since this will compete for resources

@ruflin
Copy link
Contributor

ruflin commented Nov 2, 2023

If @jsoriano agrees, lets get this PR in rather soonish and then add the corpus templates to some of the integration packages. I expect that this will also provide us some more feedback for iterating on the command itself. Also get it in the hands of the rest of the team to start playing with it.

Can we treat the command "beta" for now so we can still make breaking changes to it?

@aspacca
Copy link
Contributor Author

aspacca commented Nov 2, 2023

If @jsoriano agrees, lets get this PR in rather soonish

we must merge elastic/package-spec#653 in order to pass CI

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to merge this and iterate, once the package-spec change is released.

@aspacca
Copy link
Contributor Author

aspacca commented Nov 6, 2023

Ok to merge this and iterate, once the package-spec change is released.

before merging this I have to revert the last two commits, but without package-spec released it won't pass CI.
do I miss anything? :)

@ruflin
Copy link
Contributor

ruflin commented Nov 6, 2023

I just merged the package-spec PR. My assumption is that we would stay on a commit reference but would move it over to the one in package-spec which is merged now instead of a release. And then as soon as a new package-spec is out, we update.

@jsoriano
Copy link
Member

jsoriano commented Nov 7, 2023

Updating package spec in a separate PR #1539

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @aspacca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants