Skip to content

Commit 1a549f1

Browse files
authored
Add challange for long-running benchmarks
Add a new challenge `elasticlogs-continuous-index-and-query` suitable for long-running benchmarks. This commit also includes: * an updated version of the `deleteindex_runner.py` to help keep rolled-over indices to a defined size by deleting older ones. * updates to `README.md`. * a test/helper script tests/validate_challanges.py to assist with the JSON validation of challenges that contain embedded j2 DSL. Relates elastic#18
1 parent 78d4baa commit 1a549f1

8 files changed

+410
-85
lines changed

README.md

+105-56
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,55 @@ This challenge assumes that the *elasticlogs-1bn-load* track has been executed a
4949

5050
In this challenge rate-limited indexing at varying levels is combined with a fixed level of querying. If metrics from the run are stored in Elasticsearch, it is possible analyse these in Kibana in order to identify how indexing rate affects query latency and vice versa.
5151

52+
### 7) elasticlogs-continuous-index-and-query
53+
54+
This challenge is suitable for long term execution and runs in two phases. Both phases (`p1`, `p2`) index documents containing auto-generated event, however, `p1` indexes events at the max possible speed, whereas `p2` throttles indexing to a specified rate and in parallel executes four queries simulating Kibana dashboards and queries. The created index gets rolled over after the configured max size and the maximum amount of rolled over indices are also configurable.
55+
56+
The table below shows the track parameters that can be adjusted along with default values:
57+
58+
| Parameter | Explanation | Type | Default Value |
59+
| --------- | ----------- | ---- | ------------- |
60+
| `number_of_replicas` | Number of index replicas | `int` | `0` |
61+
| `shard_count` | Number of primary shards | `int` | `2` |
62+
| `p1_bulk_indexing_clients` | Number of [clients](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#schedule) used to index during phase 1 | `int` | `40` |
63+
| `p1_bulk_size` | The [build-size](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#bulk) for the autogenerated events during phase 1 | `int` | `1000` |
64+
| `p1_duration_secs` | Duration of phase 1 execution in sec | `int` | `7200` |
65+
| `p2_bulk_indexing_clients` | Number of [clients](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#schedule) used to index during phase 2 | `int` | `16` |
66+
| `p2_bulk_size` | The [build-size](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#bulk) for the autogenerated events during phase 2 | `int` | `1000` |
67+
| `p2_ops` | Number of bulk indexing ops/s for phase 2. A value of `10` with `p2_bulk_size=10` throttles indexing to 10000 docs/s | `int` | `10` |
68+
| `index_alias` | Specifies default index alias. | `str` | `elasticlogs_q_write` |
69+
| `rollover_max_size` | Max index size condition for [rollover API](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-index) | `str` | `30gb` |
70+
| `rollover_max_age` | Max age condition for [rollover API](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-index) | `str` | `1d` |
71+
| `p2_query1_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-traffic-country-dashboard_60m` | `int` | `30` |
72+
| `p2_query2_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-discover_30m` | `int` | `30` |
73+
| `p2_query3_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-traffic-dashboard_30m` | `int` | `30` |
74+
| `p2_query4_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-content_issues-dashboard_30m"` | `int` | `30` |
75+
| `max_rolledover_indices` | Max amount of recently rolled over indices to retain | `int` | `20` |
76+
| `indices_delete_pattern` | pattern to use for matching and deleting old rolled over indices. See also suffix_separator. | `str` | `elasticlogs_q-*` |
77+
| `rolledover_indices_suffix_separator` | Separator for extracting suffix to help determining which rolled-over indices to delete | `str` | `-` |
78+
79+
The indices use the alias `elasticlogs_q_write` and start with `elasticlogs_q-000001`. As an example, for a cluster with rolled over indices: `elasticlogs-000001`, `elasticlogs-000002`, ... `000010` a value of `max_rolledover_indices=8` results to the removal of `elasticlogs-000001` and `elasticlogs-000002`.
80+
81+
A value of `max_rolledover_indices=20` on a three node bare-metal cluster with [these specifications](https://elasticsearch-benchmarks-internal.elastic.co/app/kibana#/visualize/edit/02c3be00-8a66-11e8-8558-f33069e7a81e?_g=()&_a=(filters:!(),linked:!f,query:(language:lucene,query:(query_string:(analyze_wildcard:!t,query:'*'))),uiState:(),vis:(aggs:!(),params:(fontSize:12,markdown:'%23%23%23%20Benchmarking%20Methodology%0A%0AAll%20benchmarks%20are%20run%20by%20Rally%20against%20the%20Elasticsearch%20latest%20snapshot%20as%20of%20the%20start%20date.%20Each%20benchmark%20runs%20for%2030%20days.%0A%0AThe%20benchmark%20uses%20four%20machines.%20On%20one%20we%20run%20the%20benchmark%20driver%20(Rally),%20on%20the%20other%20three%20the%20benchmark%20candidates.%0A%0AThe%20Elasticsearch%20node%20uses%20default%20settings%20except%20for:%0A%0AAdapted%20JVM%20settings:%0A%0A*%20Heap%20is%20increased%20to%208GB%20(%60-Xms8G%20-Xmx8G%60)%0A*%20Assertions%20are%20enabled%20(%60-ea%60)%0A*%20GC%20log%20is%20enabled%20(rolling)%0A%0AAdapted%20Elasticsearch%20settings:%0A%0A*%20%60network.host:%200.0.0.0%60%0A*%20%60bootstrap.memory_lock:%20true%60%0A%0AWe%20also%20run%20this%20node%20with%20the%20following%20plugins:%0A*%20x-pack%20(authentication%20backed%20by%20a%20file%20store%20%2B%20SSL%20enabled%20with%20self-signed%20certificates)%0A*%20ingest-geoip%0A%0AAll%20benchmarks%20are%20run%20on%20a%20bare%20metal%20machine%20with%20the%20following%20specifications:%0A%0A*%20CPU:%20Intel(R)%20Core(TM)%20i7-6700%20CPU%20@%203.40GHz%0A*%20RAM:%2032%20GB%0A*%20SSD:%20Crucial%20MX200%0A*%20OS:%20Linux%20Kernel%20version%204.13.0-38%0A*%20OS%20tuning:%0A%20%20*%20Turbo%20boost%20disabled%20(%60%2Fsys%2Fdevices%2Fsystem%2Fcpu%2Fintel_pstate%2Fno_turbo%60)%0A%20%20*%20THP%20at%20default%20%60madvise%60%20(%60%2Fsys%2Fkernel%2Fmm%2Ftransparent_hugepage%2F%7Bdefrag,enabled%7D%60)%0A*%20JVM:%20Oracle%20JDK%201.8.0_131%0A%0A%23%23%23%20Benchmark%0A%0AThese%20benchmarks%20run%20the%20%5Bcontinuous%20index%20and%20query%20challenge%5D(https:%2F%2Fgithub.com%2Fdliappis%2Frally-eventdata-track%2Fblob%2Flongrun-benchmarks%2Feventdata%2Fchallenges%2Felasticlogs-continuous-index-and-query.json)%20from%20the%20%5Brally-eventdata-track%5D(https:%2F%2Fgithub.com%2Fdliappis%2Frally-eventdata-track%2Ftree%2Flongrun-benchmarks)%20with%20the%20following%20parameters:%0A%0A%60%60%60%0A%7B%0A%20%20%22number_of_replicas%22:%201,%0A%20%20%22shard_count%22:%203,%0A%20%20%22p1_bulk_indexing_clients%22:%2032,%0A%20%20%22p1_bulk_size%22:%201000,%0A%20%20%22p1_duration_secs%22:%2028800,%0A%20%20%22p2_bulk_indexing_clients%22:%2012,%0A%20%20%22p2_bulk_size%22:%201000,%0A%20%20%22p2_ops%22:%2030,%0A%20%20%22max_rolledover_indices%22:%2020,%0A%20%20%22rollover_max_size%22:%20%2230gb%22%0A%7D%0A%60%60%60',type:markdown),title:'Benchmarking%20Methodology%20v2',type:markdown))) ends up consuming a constant of `407GiB` per node.
82+
83+
It is recommended to store any track parameters in a json file and pass them to Rally using `--track-params=./params-file.json`. Example content:
84+
85+
``` shell
86+
$ cat params-file.json
87+
{
88+
"number_of_replicas": 1,
89+
"shard_count": 3,
90+
"p1_bulk_indexing_clients": 32,
91+
"p1_bulk_size": 1000,
92+
"p1_duration_secs": 28800,
93+
"p2_bulk_indexing_clients": 12,
94+
"p2_bulk_size": 1000,
95+
"p2_ops": 30,
96+
"max_rolledover_indices": 20,
97+
"rollover_max_size": "30gb"
98+
}
99+
```
100+
52101
## Custom parameter sources
53102

54103
### elasticlogs\_bulk\_source
@@ -59,59 +108,59 @@ The generator allows data to be generated in real-time or against a set date/tin
59108

60109
```
61110
{
62-
"@timestamp": "2017-06-01T00:01:08.866644Z",
63-
"offset": 7631775,
64-
"user_name": "-",
65-
"source": "/usr/local/var/log/nginx/access.log",
66-
"fileset": {
67-
"module": "nginx",
68-
"name": "access"
69-
},
70-
"input": {
71-
"type": "log"
72-
},
73-
"beat": {
74-
"version": "6.3.0",
75-
"hostname": "web-EU-1.elastic.co",
76-
"name": "web-EU-1.elastic.co"
77-
},
78-
"prospector": {
79-
"type": "log"
80-
},
81-
"nginx": {
82-
"access": {
83-
"user_agent": {
84-
"major": "44",
85-
"os": "Mac OS X",
86-
"os_major": "10",
87-
"name": "Firefox",
88-
"os_name": "Mac OS X",
89-
"device": "Other"
90-
},
91-
"remote_ip": "5.134.208.0",
92-
"remote_ip_list": [
93-
"5.134.208.0"
94-
],
95-
"geoip": {
96-
"continent_name": "Europe",
97-
"city_name": "Grupa",
98-
"country_name": "Poland",
99-
"country_iso_code": "PL",
100-
"location": {
101-
"lat": 53.5076,
102-
"lon": 18.6358
103-
}
104-
},
105-
"referrer": "https://www.elastic.co/guide/en/marvel/current/getting-started.html",
106-
"url": "/guide/en/kibana/current/images/autorefresh-pause.png",
107-
"body_sent": {
108-
"bytes": 2122
109-
},
110-
"method": "GET",
111-
"response_code": "200",
112-
"http_version": "1.1"
113-
}
114-
}
111+
"@timestamp": "2017-06-01T00:01:08.866644Z",
112+
"offset": 7631775,
113+
"user_name": "-",
114+
"source": "/usr/local/var/log/nginx/access.log",
115+
"fileset": {
116+
"module": "nginx",
117+
"name": "access"
118+
},
119+
"input": {
120+
"type": "log"
121+
},
122+
"beat": {
123+
"version": "6.3.0",
124+
"hostname": "web-EU-1.elastic.co",
125+
"name": "web-EU-1.elastic.co"
126+
},
127+
"prospector": {
128+
"type": "log"
129+
},
130+
"nginx": {
131+
"access": {
132+
"user_agent": {
133+
"major": "44",
134+
"os": "Mac OS X",
135+
"os_major": "10",
136+
"name": "Firefox",
137+
"os_name": "Mac OS X",
138+
"device": "Other"
139+
},
140+
"remote_ip": "5.134.208.0",
141+
"remote_ip_list": [
142+
"5.134.208.0"
143+
],
144+
"geoip": {
145+
"continent_name": "Europe",
146+
"city_name": "Grupa",
147+
"country_name": "Poland",
148+
"country_iso_code": "PL",
149+
"location": {
150+
"lat": 53.5076,
151+
"lon": 18.6358
152+
}
153+
},
154+
"referrer": "https://www.elastic.co/guide/en/marvel/current/getting-started.html",
155+
"url": "/guide/en/kibana/current/images/autorefresh-pause.png",
156+
"body_sent": {
157+
"bytes": 2122
158+
},
159+
"method": "GET",
160+
"response_code": "200",
161+
"http_version": "1.1"
162+
}
163+
}
115164
}
116165
```
117166

@@ -155,7 +204,7 @@ As you can see, branches can match exact release numbers but Rally is also lenie
155204

156205
Apart from that, the master branch is always considered to be compatible with the Elasticsearch master branch.
157206

158-
To specify the version to check against, add `--distribution-version` when running Rally. It it is not specified, Rally assumes that you want to benchmark against the Elasticsearch master version.
207+
To specify the version to check against, add `--distribution-version` when running Rally. It it is not specified, Rally assumes that you want to benchmark against the Elasticsearch master version.
159208

160209
Example: If you want to benchmark Elasticsearch 6.2.4, run the following command:
161210

@@ -167,12 +216,12 @@ How to Contribute
167216
-----------------
168217

169218
If you want to contribute to this track, please ensure that it works against the master version of Elasticsearch (i.e. submit PRs against the master branch). We can then check whether it's feasible to backport the track to earlier Elasticsearch versions.
170-
219+
171220
See all details in the [contributor guidelines](https://github.com/elastic/rally/blob/master/CONTRIBUTING.md).
172221

173222
License
174223
-------
175-
224+
176225
This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.
177226

178227
Copyright 2015-2018 Elasticsearch <https://www.elastic.co>

eventdata/challenges/combined-indexing-and-querying.json

+7-7
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"name": "combined-indexing-and-querying",
1111
"description": "This challenge simulates a set of Kibana queries against historical data (elasticlogs_q-* indices) as well as against the most recent data currently being indexed. It combined this with rate-limited indexing at varying levels. It assumes one of the challenges creating elasticlogs_q-* indices has been run.",
1212
"meta": {
13-
"benchmark_type": "indexing/querying",
13+
"benchmark_type": "indexing/querying",
1414
"target_kibana_queries_per_minute": 7
1515
},
1616
"schedule": [
@@ -25,7 +25,7 @@
2525
},
2626
{
2727
"operation": "relative-kibana-content_issues-dashboard_50%",
28-
"target-interval": 60,
28+
"target-interval": 60,
2929
"warmup-time-period": 0,
3030
"clients": 1,
3131
"time-period": {{ p_rate_limit_duration_secs }},
@@ -40,14 +40,14 @@
4040
"warmup-iterations": 0,
4141
"iterations": 1
4242
},
43-
{# Add some data to index so it does not start empty #}
43+
{# Add some data to index so it does not start empty #}
4444
{
4545
"operation": "index-append-1000-elasticlogs_i_write",
4646
"time-period": {{ p_rate_limit_duration_secs }},
4747
"target-throughput": 10,
4848
"clients": {{ p_client_count }}
4949
},
50-
50+
5151
{% for ops in range(p_rate_limit_step, p_rate_limit_max, p_rate_limit_step) %}
5252

5353

@@ -94,8 +94,8 @@
9494
},
9595
"schedule": "poisson"
9696
},
97-
{
98-
"name": "current-kibana-content_issues-dashboard_30m-{{rate}}",
97+
{
98+
"name": "current-kibana-content_issues-dashboard_30m-{{rate}}",
9999
"operation": "current-kibana-content_issues-dashboard_30m",
100100
"target-interval": 60,
101101
"clients": 2,
@@ -105,7 +105,7 @@
105105
},
106106
"schedule": "poisson"
107107
},
108-
{
108+
{
109109
"name": "current-kibana-traffic-dashboard_15m-{{rate}}",
110110
"operation": "current-kibana-traffic-dashboard_15m",
111111
"target-interval": 30,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
{% set p1_bulk_indexing_clients = (p1_bulk_indexing_clients | default(40)) %}
2+
{% set p2_bulk_indexing_clients = (p2_bulk_indexing_clients | default(16)) %}
3+
{# Phase 1 is indexing only at max speed for 2 hours #}
4+
{% set p1_duration = (p1_duration_secs | default(7200)) %}
5+
{# Phase 2 is indexing and querying for 29 days #}
6+
{% set p2_duration = (p2_duration_secs | default(2505600)) %}
7+
{% set p2_ops = (p2_ops | default(10)) %}
8+
{% set p2_rate = (p2_ops * (p2_bulk_size | default(1000))) %}
9+
{
10+
"name": "elasticlogs-continuous-index-and-query",
11+
"description": "Indexes 1bn (default) documents into elasticlogs_q-* indices. IDs are autogenerated by Elasticsearch, meaning there are no conflicts.",
12+
"meta": {
13+
"benchmark_type": "indexing"
14+
},
15+
"schedule": [
16+
{
17+
"operation": "deleteindex_elasticlogs_q-*",
18+
"clients": 1,
19+
"warmup-iterations": 0,
20+
"iterations": 1
21+
},
22+
{
23+
"operation": "create_elasticlogs_q_write",
24+
"clients": 1,
25+
"warmup-iterations": 0,
26+
"iterations": 1
27+
},
28+
{
29+
"parallel": {
30+
"time-period": {{ p1_duration }},
31+
"warmup-time-period": 0,
32+
"tasks": [
33+
{
34+
"name": "index-append-elasticlogs_q_write-phase1",
35+
"operation": {
36+
"operation-type": "bulk",
37+
"index": "elasticlogs_q_write",
38+
"param-source": "elasticlogs_bulk",
39+
"bulk-size": {{ p1_bulk_size | default(1000) | int }}
40+
},
41+
"clients": {{ p1_bulk_indexing_clients }},
42+
"meta": {
43+
"querying": "no"
44+
}
45+
},
46+
{
47+
"#COMMENT": "Check if index alias needs to be rolled over every 30seconds.",
48+
"name": "rollover-indices-phase1",
49+
"operation": "rollover_custom_alias",
50+
"clients": 1,
51+
"target-interval": 30
52+
},
53+
{
54+
"#COMMENT": "Delete indices that have been rolled over more than (by default) 20 times",
55+
"name": "delete-rolledover-indices-phase1",
56+
"operation": "delete_rolledover_index_pattern",
57+
"clients": 1,
58+
"target-interval": 30
59+
}
60+
]
61+
}
62+
},
63+
{
64+
"parallel": {
65+
"time-period": {{ p2_duration }},
66+
"warmup-time-period": 0,
67+
"tasks": [
68+
{
69+
"name": "index-append-elasticlogs_q_write-phase2",
70+
"operation": {
71+
"operation-type": "bulk",
72+
"index": "elasticlogs_q_write",
73+
"param-source": "elasticlogs_bulk",
74+
"bulk-size": {{ p2_bulk_size | default(1000) | int }}
75+
},
76+
"target-throughput": {{ p2_ops }},
77+
"clients": {{ p2_bulk_indexing_clients }},
78+
"meta": {
79+
"target_indexing_rate": {{ p2_rate }}
80+
}
81+
},
82+
{
83+
"name": "rollover-indices-phase2",
84+
"operation": "rollover_custom_alias",
85+
"clients": 1,
86+
"target-interval": 30
87+
},
88+
{
89+
"name": "delete_rolled_over_indices-phase2",
90+
"operation": "delete_rolledover_index_pattern",
91+
"clients": 1,
92+
"target-interval": 30
93+
},
94+
{
95+
"name": "current-kibana-traffic-country-dashboard_60m-querying",
96+
"operation": "current-kibana-traffic-country-dashboard_60m",
97+
"clients": 1,
98+
"target-interval": {{ p2_query1_target_interval | default(30) | int }},
99+
"meta": {
100+
"querying": "yes",
101+
"query_type": "current"
102+
},
103+
"schedule": "poisson"
104+
},
105+
{
106+
"name": "current-kibana-discover_30m-querying",
107+
"operation": "current-kibana-discover_30m",
108+
"clients": 1,
109+
"target-interval": {{ p2_query2_target_interval | default(30) | int }},
110+
"meta": {
111+
"querying": "yes",
112+
"query_type": "current"
113+
},
114+
"schedule": "poisson"
115+
},
116+
{
117+
"name": "current-kibana-traffic-dashboard_30m-querying",
118+
"operation": "current-kibana-traffic-dashboard_30m",
119+
"clients": 1,
120+
"target-interval": {{ p2_query3_target_interval | default(30) | int }},
121+
"meta": {
122+
"querying": "yes",
123+
"query_type": "current"
124+
},
125+
"schedule": "poisson"
126+
},
127+
{
128+
"name": "current-kibana-content_issues-dashboard_30m-querying",
129+
"#COMMENT": "Looks only for 404s about 1-1.5% of data",
130+
"operation": "current-kibana-content_issues-dashboard_30m",
131+
"clients": 1,
132+
"target-interval": {{ p2_query4_target_interval | default(30) | int }},
133+
"meta": {
134+
"querying": "yes",
135+
"query_type": "current"
136+
},
137+
"schedule": "poisson"
138+
}
139+
]
140+
}
141+
}
142+
]
143+
}

0 commit comments

Comments
 (0)