Skip to content

Commit 6d2c40e

Browse files
authored
Enforce that responses in docs are valid json (#26249)
All of the snippets in our docs marked with `// TESTRESPONSE` are checked against the response from Elasticsearch but, due to the way they are implemented they are actually parsed as YAML instead of JSON. Luckilly, all valid JSON is valid YAML! Unfurtunately that means that invalid JSON has snuck into the exmples! This adds a step during the build to parse them as JSON and fail the build if they don't parse. But no! It isn't quite that simple. The displayed text of some of these responses looks like: ``` { ... "aggregations": { "range": { "buckets": [ { "to": 1.4436576E12, "to_as_string": "10-2015", "doc_count": 7, "key": "*-10-2015" }, { "from": 1.4436576E12, "from_as_string": "10-2015", "doc_count": 0, "key": "10-2015-*" } ] } } } ``` Note the `...` which isn't valid json but we like it anyway and want it in the output. We use substitution rules to convert the `...` into the response we expect. That yields a response that looks like: ``` { "took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits, "aggregations": { "range": { "buckets": [ { "to": 1.4436576E12, "to_as_string": "10-2015", "doc_count": 7, "key": "*-10-2015" }, { "from": 1.4436576E12, "from_as_string": "10-2015", "doc_count": 0, "key": "10-2015-*" } ] } } } ``` That is what the tests consume but it isn't valid JSON! Oh no! We don't want to go update all the substitution rules because that'd be huge and, ultimately, wouldn't buy much. So we quote the `$body.took` bits before parsing the JSON. Note the responses that we use for the `_cat` APIs are all converted into regexes and there is no expectation that they are valid JSON. Closes #26233
1 parent 15b7aee commit 6d2c40e

12 files changed

+51
-31
lines changed

buildSrc/src/main/groovy/org/elasticsearch/gradle/doc/SnippetsTask.groovy

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@
1919

2020
package org.elasticsearch.gradle.doc
2121

22+
import groovy.json.JsonException
23+
import groovy.json.JsonParserType
24+
import groovy.json.JsonSlurper
25+
2226
import org.gradle.api.DefaultTask
2327
import org.gradle.api.InvalidUserDataException
2428
import org.gradle.api.file.ConfigurableFileTree
@@ -117,6 +121,23 @@ public class SnippetsTask extends DefaultTask {
117121
+ "contain `curl`.")
118122
}
119123
}
124+
if (snippet.testResponse && snippet.language == 'js') {
125+
String quoted = snippet.contents
126+
// quote values starting with $
127+
.replaceAll(/([:,])\s*(\$[^ ,\n}]+)/, '$1 "$2"')
128+
// quote fields starting with $
129+
.replaceAll(/(\$[^ ,\n}]+)\s*:/, '"$1":')
130+
JsonSlurper slurper =
131+
new JsonSlurper(type: JsonParserType.INDEX_OVERLAY)
132+
try {
133+
slurper.parseText(quoted)
134+
} catch (JsonException e) {
135+
throw new InvalidUserDataException("Invalid json "
136+
+ "in $snippet. The error is:\n${e.message}.\n"
137+
+ "After substitutions and munging, the json "
138+
+ "looks like:\n$quoted", e)
139+
}
140+
}
120141
perSnippet(snippet)
121142
snippet = null
122143
}

docs/plugins/analysis-kuromoji.asciidoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,6 @@ The above `analyze` request returns the following:
160160

161161
[source,js]
162162
--------------------------------------------------
163-
# Result
164163
{
165164
"tokens" : [ {
166165
"token" : "東京",

docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ The `diversified_sampler` aggregation adds the ability to limit the number of ma
66

77
NOTE: Any good market researcher will tell you that when working with samples of data it is important
88
that the sample represents a healthy variety of opinions rather than being skewed by any single voice.
9-
The same is true with aggregations and sampling with these diversify settings can offer a way to remove the bias in your content (an over-populated geography,
10-
a large spike in a timeline or an over-active forum spammer).
9+
The same is true with aggregations and sampling with these diversify settings can offer a way to remove the bias in your content (an over-populated geography,
10+
a large spike in a timeline or an over-active forum spammer).
1111

1212

1313
.Example use cases:
1414
* Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches
1515
* Removing bias from analytics by ensuring fair representation of content from different sources
1616
* Reducing the running cost of aggregations that can produce useful results using only samples e.g. `significant_terms`
17-
18-
A choice of `field` or `script` setting is used to provide values used for de-duplication and the `max_docs_per_value` setting controls the maximum
17+
18+
A choice of `field` or `script` setting is used to provide values used for de-duplication and the `max_docs_per_value` setting controls the maximum
1919
number of documents collected on any one shard which share a common value. The default setting for `max_docs_per_value` is 1.
2020

2121
The aggregation will throw an error if the choice of `field` or `script` produces multiple values for a single document (de-duplication using multi-valued fields is not supported due to efficiency concerns).
@@ -39,7 +39,7 @@ POST /stackoverflow/_search?size=0
3939
"my_unbiased_sample": {
4040
"diversified_sampler": {
4141
"shard_size": 200,
42-
"field" : "author"
42+
"field" : "author"
4343
},
4444
"aggs": {
4545
"keywords": {
@@ -89,7 +89,7 @@ Response:
8989

9090
==== Scripted example:
9191

92-
In this scenario we might want to diversify on a combination of field values. We can use a `script` to produce a hash of the
92+
In this scenario we might want to diversify on a combination of field values. We can use a `script` to produce a hash of the
9393
multiple values in a tags field to ensure we don't have a sample that consists of the same repeated combinations of tags.
9494

9595
[source,js]
@@ -109,7 +109,7 @@ POST /stackoverflow/_search?size=0
109109
"script" : {
110110
"lang": "painless",
111111
"source": "doc['tags'].values.hashCode()"
112-
}
112+
}
113113
},
114114
"aggs": {
115115
"keywords": {
@@ -150,7 +150,7 @@ Response:
150150
"doc_count": 3,
151151
"score": 1.34,
152152
"bg_count": 200
153-
},
153+
}
154154
]
155155
}
156156
}
@@ -175,11 +175,11 @@ The default setting is "1".
175175

176176
The optional `execution_hint` setting can influence the management of the values used for de-duplication.
177177
Each option will hold up to `shard_size` values in memory while performing de-duplication but the type of value held can be controlled as follows:
178-
178+
179179
- hold field values directly (`map`)
180180
- hold ordinals of the field as determined by the Lucene index (`global_ordinals`)
181181
- hold hashes of the field values - with potential for hash collisions (`bytes_hash`)
182-
182+
183183
The default setting is to use `global_ordinals` if this information is available from the Lucene index and reverting to `map` if not.
184184
The `bytes_hash` setting may prove faster in some cases but introduces the possibility of false positives in de-duplication logic due to the possibility of hash collisions.
185185
Please note that Elasticsearch will ignore the choice of execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.

docs/reference/cat.asciidoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ GET /_cat/master?v
3535

3636
Might respond with:
3737

38-
[source,js]
38+
[source,txt]
3939
--------------------------------------------------
4040
id host ip node
4141
u_n93zwxThWHi1PDBJAGAg 127.0.0.1 127.0.0.1 u_n93zw
@@ -57,7 +57,7 @@ GET /_cat/master?help
5757

5858
Might respond respond with:
5959

60-
[source,js]
60+
[source,txt]
6161
--------------------------------------------------
6262
id | | node id
6363
host | h | host name
@@ -81,7 +81,7 @@ GET /_cat/nodes?h=ip,port,heapPercent,name
8181

8282
Responds with:
8383

84-
[source,js]
84+
[source,txt]
8585
--------------------------------------------------
8686
127.0.0.1 9300 27 sLBaIGK
8787
--------------------------------------------------
@@ -197,7 +197,7 @@ GET _cat/templates?v&s=order:desc,index_patterns
197197

198198
returns:
199199

200-
[source,sh]
200+
[source,txt]
201201
--------------------------------------------------
202202
name index_patterns order version
203203
pizza_pepperoni [*pepperoni*] 2

docs/reference/cat/indices.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ GET /_cat/indices/twitter?pri&v&h=health,index,pri,rep,docs.count,mt
9797

9898
Might look like:
9999

100-
[source,js]
100+
[source,txt]
101101
--------------------------------------------------
102102
health index pri rep docs.count mt pri.mt
103103
yellow twitter 1 1 1200 16 16
@@ -115,7 +115,7 @@ GET /_cat/indices?v&h=i,tm&s=tm:desc
115115

116116
Might look like:
117117

118-
[source,js]
118+
[source,txt]
119119
--------------------------------------------------
120120
i tm
121121
twitter 8.1gb

docs/reference/cat/nodeattrs.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ GET /_cat/nodeattrs?v&h=name,pid,attr,value
4949

5050
Might look like:
5151

52-
[source,js]
52+
[source,txt]
5353
--------------------------------------------------
5454
name pid attr value
5555
EK_AsJb 19566 testattr test

docs/reference/cat/nodes.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ GET /_cat/nodes?v&h=id,ip,port,v,m
5353

5454
Might look like:
5555

56-
["source","js",subs="attributes,callouts"]
56+
["source","txt",subs="attributes,callouts"]
5757
--------------------------------------------------
5858
id ip port v m
5959
veJR 127.0.0.1 59938 {version} *

docs/reference/cat/recovery.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ GET _cat/recovery?v
2121

2222
The response of this request will be something like:
2323

24-
[source,js]
24+
[source,txt]
2525
---------------------------------------------------------------------------
2626
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
2727
twitter 0 13ms store done n/a n/a 127.0.0.1 node-0 n/a n/a 0 0 100% 13 0 0 100% 9928 0 0 100.0%
@@ -48,7 +48,7 @@ GET _cat/recovery?v&h=i,s,t,ty,st,shost,thost,f,fp,b,bp
4848

4949
This will return a line like:
5050

51-
[source,js]
51+
[source,txt]
5252
----------------------------------------------------------------------------
5353
i s t ty st shost thost f fp b bp
5454
twitter 0 1252ms peer done 192.168.1.1 192.168.1.2 0 100.0% 0 100.0%
@@ -76,7 +76,7 @@ GET _cat/recovery?v&h=i,s,t,ty,st,rep,snap,f,fp,b,bp
7676

7777
This will show a recovery of type snapshot in the response
7878

79-
[source,js]
79+
[source,txt]
8080
--------------------------------------------------------------------------------
8181
i s t ty st rep snap f fp b bp
8282
twitter 0 1978ms snapshot done twitter snap_1 79 8.0% 12086 9.0%

docs/reference/cat/shards.asciidoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ GET _cat/shards
1616

1717
This will return
1818

19-
[source,js]
19+
[source,txt]
2020
---------------------------------------------------------------------------
2121
twitter 0 p STARTED 3014 31.1mb 192.168.56.10 H5dfFeA
2222
---------------------------------------------------------------------------
@@ -42,7 +42,7 @@ GET _cat/shards/twitt*
4242

4343
Which will return the following
4444

45-
[source,js]
45+
[source,txt]
4646
---------------------------------------------------------------------------
4747
twitter 0 p STARTED 3014 31.1mb 192.168.56.10 H5dfFeA
4848
---------------------------------------------------------------------------
@@ -68,7 +68,7 @@ GET _cat/shards
6868

6969
A relocating shard will be shown as follows
7070

71-
[source,js]
71+
[source,txt]
7272
---------------------------------------------------------------------------
7373
twitter 0 p RELOCATING 3014 31.1mb 192.168.56.10 H5dfFeA -> -> 192.168.56.30 bGG90GE
7474
---------------------------------------------------------------------------
@@ -90,7 +90,7 @@ GET _cat/shards
9090

9191
You can get the initializing state in the response like this
9292

93-
[source,js]
93+
[source,txt]
9494
---------------------------------------------------------------------------
9595
twitter 0 p STARTED 3014 31.1mb 192.168.56.10 H5dfFeA
9696
twitter 0 r INITIALIZING 0 14.3mb 192.168.56.30 bGG90GE
@@ -112,7 +112,7 @@ GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
112112

113113
The reason for an unassigned shard will be listed as the last field
114114

115-
[source,js]
115+
[source,txt]
116116
---------------------------------------------------------------------------
117117
twitter 0 p STARTED 3014 31.1mb 192.168.56.10 H5dfFeA
118118
twitter 0 r STARTED 3014 31.1mb 192.168.56.30 bGG90GE

docs/reference/cat/thread_pool.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ GET /_cat/thread_pool/generic?v&h=id,name,active,rejected,completed
9292

9393
which looks like:
9494

95-
[source,js]
95+
[source,txt]
9696
--------------------------------------------------
9797
id name active rejected completed
9898
0EWUhXeBQtaVGlexUeVwMg generic 0 0 70

docs/reference/docs/update.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ the request was ignored.
170170
"_type": "type1",
171171
"_id": "1",
172172
"_version": 6,
173-
"result": noop
173+
"result": "noop"
174174
}
175175
--------------------------------------------------
176176
// TESTRESPONSE

docs/reference/getting-started.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,7 @@ PUT /customer/doc/1?pretty
373373

374374
And the response:
375375

376-
[source,sh]
376+
[source,js]
377377
--------------------------------------------------
378378
{
379379
"_index" : "customer",
@@ -672,7 +672,7 @@ GET /_cat/indices?v
672672

673673
And the response:
674674

675-
[source,js]
675+
[source,txt]
676676
--------------------------------------------------
677677
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
678678
yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb

0 commit comments

Comments
 (0)