Skip to content

Commit 2475520

Browse files
Add randomScore function in script_score query (#40186)
To make script_score query to have the same features as function_score query, we need to add randomScore function. This function produces different random scores on different index shards. It is also able to produce random scores based on the internal Lucene Document Ids.
1 parent 85848af commit 2475520

File tree

8 files changed

+301
-114
lines changed

8 files changed

+301
-114
lines changed

docs/reference/query-dsl/script-score-query.asciidoc

+22-38
Original file line numberDiff line numberDiff line change
@@ -182,60 +182,44 @@ different from the query's vector, 0 is used for missing dimensions
182182
in the calculations of vector functions.
183183

184184

185-
[[random-functions]]
186-
===== Random functions
187-
There are two predefined ways to produce random values:
188-
`randomNotReproducible` and `randomReproducible`.
185+
[[random-score-function]]
186+
===== Random score function
187+
`random_score` function generates scores that are uniformly distributed
188+
from 0 up to but not including 1.
189189

190-
`randomNotReproducible()` uses `java.util.Random` class
191-
to generate a random value of the type `long`.
192-
The generated values are not reproducible between requests' invocations.
190+
`randomScore` function has the following syntax:
191+
`randomScore(<seed>, <fieldName>)`.
192+
It has a required parameter - `seed` as an integer value,
193+
and an optional parameter - `fieldName` as a string value.
193194

194195
[source,js]
195196
--------------------------------------------------
196197
"script" : {
197-
"source" : "randomNotReproducible()"
198+
"source" : "randomScore(100, '_seq_no')"
198199
}
199200
--------------------------------------------------
200201
// NOTCONSOLE
201202

202-
203-
`randomReproducible(String seedValue, int seed)` produces
204-
reproducible random values of type `long`. This function requires
205-
more computational time and memory than the non-reproducible version.
206-
207-
A good candidate for the `seedValue` is document field values that
208-
are unique across documents and already pre-calculated and preloaded
209-
in the memory. For example, values of the document's `_seq_no` field
210-
is a good candidate, as documents on the same shard have unique values
211-
for the `_seq_no` field.
203+
If the `fieldName` parameter is omitted, the internal Lucene
204+
document ids will be used as a source of randomness. This is very efficient,
205+
but unfortunately not reproducible since documents might be renumbered
206+
by merges.
212207

213208
[source,js]
214209
--------------------------------------------------
215210
"script" : {
216-
"source" : "randomReproducible(Long.toString(doc['_seq_no'].value), 100)"
211+
"source" : "randomScore(100)"
217212
}
218213
--------------------------------------------------
219214
// NOTCONSOLE
220215

221216

222-
A drawback of using `_seq_no` is that generated values change if
223-
documents are updated. Another drawback is not absolute uniqueness, as
224-
documents from different shards with the same sequence numbers
225-
generate the same random values.
226-
227-
If you need random values to be distinct across different shards,
228-
you can use a field with unique values across shards,
229-
such as `_id`, but watch out for the memory usage as all
230-
these unique values need to be loaded into memory.
231-
232-
[source,js]
233-
--------------------------------------------------
234-
"script" : {
235-
"source" : "randomReproducible(doc['_id'].value, 100)"
236-
}
237-
--------------------------------------------------
238-
// NOTCONSOLE
217+
Note that documents that are within the same shard and have the
218+
same value for field will get the same score, so it is usually desirable
219+
to use a field that has unique values for all documents across a shard.
220+
A good default choice might be to use the `_seq_no`
221+
field, whose only drawback is that scores will change if the document is
222+
updated since update operations also update the value of the `_seq_no` field.
239223

240224

241225
[[decay-functions]]
@@ -349,8 +333,8 @@ the following script:
349333

350334
===== `random_score`
351335

352-
Use `randomReproducible` and `randomNotReproducible` functions
353-
as described in <<random-functions, random functions>>.
336+
Use `randomScore` function
337+
as described in <<random-score-function, random score function>>.
354338

355339

356340
===== `field_value_factor`

modules/lang-painless/src/main/resources/org/elasticsearch/painless/spi/org.elasticsearch.score.txt

+5-2
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,14 @@
1919

2020
# This file contains a whitelist for functions to be used in Score context
2121

22+
class org.elasticsearch.script.ScoreScript no_import {
23+
}
24+
2225
static_import {
2326
double saturation(double, double) from_class org.elasticsearch.script.ScoreScriptUtils
2427
double sigmoid(double, double, double) from_class org.elasticsearch.script.ScoreScriptUtils
25-
double randomReproducible(String, int) from_class org.elasticsearch.script.ScoreScriptUtils
26-
double randomNotReproducible() bound_to org.elasticsearch.script.ScoreScriptUtils$RandomNotReproducible
28+
double randomScore(org.elasticsearch.script.ScoreScript, int, String) bound_to org.elasticsearch.script.ScoreScriptUtils$RandomScoreField
29+
double randomScore(org.elasticsearch.script.ScoreScript, int) bound_to org.elasticsearch.script.ScoreScriptUtils$RandomScoreDoc
2730
double decayGeoLinear(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoLinear
2831
double decayGeoExp(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoExp
2932
double decayGeoGauss(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoGauss

modules/lang-painless/src/test/resources/rest-api-spec/test/painless/80_script_score.yml

-55
Original file line numberDiff line numberDiff line change
@@ -72,61 +72,6 @@ setup:
7272
- match: { hits.hits.1._id: d2 }
7373
- match: { hits.hits.2._id: d1 }
7474

75-
---
76-
"Random functions":
77-
- do:
78-
indices.create:
79-
index: test
80-
body:
81-
settings:
82-
number_of_shards: 2
83-
mappings:
84-
properties:
85-
f1:
86-
type: keyword
87-
- do:
88-
index:
89-
index: test
90-
id: 1
91-
body: {"f1": "v1"}
92-
- do:
93-
index:
94-
index: test
95-
id: 2
96-
body: {"f1": "v2"}
97-
- do:
98-
index:
99-
index: test
100-
id: 3
101-
body: {"f1": "v3"}
102-
103-
- do:
104-
indices.refresh: {}
105-
106-
- do:
107-
search:
108-
rest_total_hits_as_int: true
109-
index: test
110-
body:
111-
query:
112-
script_score:
113-
query: {match_all: {} }
114-
script:
115-
source: "randomReproducible(Long.toString(doc['_seq_no'].value), 100)"
116-
- match: { hits.total: 3 }
117-
118-
- do:
119-
search:
120-
rest_total_hits_as_int: true
121-
index: test
122-
body:
123-
query:
124-
script_score:
125-
query: {match_all: {} }
126-
script:
127-
source: "randomNotReproducible()"
128-
- match: { hits.total: 3 }
129-
13075
---
13176
"Decay geo functions":
13277
- do:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Integration tests for ScriptScoreQuery using Painless
2+
3+
setup:
4+
- skip:
5+
version: " - 7.09.99"
6+
reason: "random score function of script score was added in 7.1"
7+
8+
---
9+
"Random score function with _seq_no field":
10+
- do:
11+
indices.create:
12+
index: test
13+
body:
14+
settings:
15+
number_of_shards: 2
16+
mappings:
17+
properties:
18+
f1:
19+
type: keyword
20+
21+
- do:
22+
bulk:
23+
refresh: true
24+
body:
25+
- '{"index": {"_index": "test"}}'
26+
- '{"f1": "v0"}'
27+
- '{"index": {"_index": "test"}}'
28+
- '{"f1": "v1"}'
29+
- '{"index": {"_index": "test"}}'
30+
- '{"f1": "v2"}'
31+
- '{"index": {"_index": "test"}}'
32+
- '{"f1": "v3"}'
33+
- '{"index": {"_index": "test"}}'
34+
- '{"f1": "v4"}'
35+
- '{"index": {"_index": "test"}}'
36+
- '{"f1": "v5"}'
37+
- '{"index": {"_index": "test"}}'
38+
- '{"f1": "v6"}'
39+
40+
- do:
41+
search:
42+
rest_total_hits_as_int: true
43+
index: test
44+
body:
45+
query:
46+
script_score:
47+
query: {match_all: {} }
48+
script:
49+
source: "randomScore(100, '_seq_no')"
50+
# stash ids to check for reproducibility of ranking
51+
- set: { hits.hits.0._id: id0 }
52+
- set: { hits.hits.1._id: id1 }
53+
- set: { hits.hits.2._id: id2 }
54+
- set: { hits.hits.3._id: id3 }
55+
- set: { hits.hits.4._id: id4 }
56+
- set: { hits.hits.5._id: id5 }
57+
- set: { hits.hits.6._id: id6 }
58+
59+
# check that ranking is reproducible
60+
- do:
61+
search:
62+
rest_total_hits_as_int: true
63+
index: test
64+
body:
65+
query:
66+
script_score:
67+
query: {match_all: {} }
68+
script:
69+
source: "randomScore(100, '_seq_no')"
70+
- match: { hits.hits.0._id: $id0 }
71+
- match: { hits.hits.1._id: $id1 }
72+
- match: { hits.hits.2._id: $id2 }
73+
- match: { hits.hits.3._id: $id3 }
74+
- match: { hits.hits.4._id: $id4 }
75+
- match: { hits.hits.5._id: $id5 }
76+
- match: { hits.hits.6._id: $id6 }
77+
78+
---
79+
"Random score function with internal doc Ids":
80+
- do:
81+
indices.create:
82+
index: test
83+
body:
84+
settings:
85+
number_of_shards: 1
86+
mappings:
87+
properties:
88+
f1:
89+
type: keyword
90+
91+
- do:
92+
bulk:
93+
refresh: true
94+
body:
95+
- '{"index": {"_index": "test"}}'
96+
- '{"f1": "v0"}'
97+
- '{"index": {"_index": "test"}}'
98+
- '{"f1": "v1"}'
99+
- '{"index": {"_index": "test"}}'
100+
- '{"f1": "v2"}'
101+
- '{"index": {"_index": "test"}}'
102+
- '{"f1": "v3"}'
103+
- '{"index": {"_index": "test"}}'
104+
- '{"f1": "v4"}'
105+
- '{"index": {"_index": "test"}}'
106+
- '{"f1": "v5"}'
107+
- '{"index": {"_index": "test"}}'
108+
- '{"f1": "v6"}'
109+
110+
- do:
111+
search:
112+
rest_total_hits_as_int: true
113+
index: test
114+
body:
115+
query:
116+
script_score:
117+
query: {match_all: {} }
118+
script:
119+
source: "randomScore(100)"
120+
# stash ids to check for reproducibility of ranking
121+
- set: { hits.hits.0._id: id0 }
122+
- set: { hits.hits.1._id: id1 }
123+
- set: { hits.hits.2._id: id2 }
124+
- set: { hits.hits.3._id: id3 }
125+
- set: { hits.hits.4._id: id4 }
126+
- set: { hits.hits.5._id: id5 }
127+
- set: { hits.hits.6._id: id6 }
128+
129+
# check that ranking is reproducible
130+
- do:
131+
search:
132+
rest_total_hits_as_int: true
133+
index: test
134+
body:
135+
query:
136+
script_score:
137+
query: {match_all: {} }
138+
script:
139+
source: "randomScore(100)"
140+
- match: { hits.hits.0._id: $id0 }
141+
- match: { hits.hits.1._id: $id1 }
142+
- match: { hits.hits.2._id: $id2 }
143+
- match: { hits.hits.3._id: $id3 }
144+
- match: { hits.hits.4._id: $id4 }
145+
- match: { hits.hits.5._id: $id5 }
146+
- match: { hits.hits.6._id: $id6 }

server/src/main/java/org/elasticsearch/common/lucene/search/function/ScriptScoreFunction.java

+15
Original file line numberDiff line numberDiff line change
@@ -50,18 +50,33 @@ public float score() {
5050

5151
private final ScoreScript.LeafFactory script;
5252

53+
private final int shardId;
54+
private final String indexName;
55+
5356

5457
public ScriptScoreFunction(Script sScript, ScoreScript.LeafFactory script) {
5558
super(CombineFunction.REPLACE);
5659
this.sScript = sScript;
5760
this.script = script;
61+
this.indexName = null;
62+
this.shardId = -1;
63+
}
64+
65+
public ScriptScoreFunction(Script sScript, ScoreScript.LeafFactory script, String indexName, int shardId) {
66+
super(CombineFunction.REPLACE);
67+
this.sScript = sScript;
68+
this.script = script;
69+
this.indexName = indexName;
70+
this.shardId = shardId;
5871
}
5972

6073
@Override
6174
public LeafScoreFunction getLeafScoreFunction(LeafReaderContext ctx) throws IOException {
6275
final ScoreScript leafScript = script.newInstance(ctx);
6376
final CannedScorer scorer = new CannedScorer();
6477
leafScript.setScorer(scorer);
78+
leafScript._setIndexName(indexName);
79+
leafScript._setShard(shardId);
6580
return new LeafScoreFunction() {
6681
@Override
6782
public double score(int docId, float subQueryScore) throws IOException {

server/src/main/java/org/elasticsearch/index/query/functionscore/ScriptScoreFunctionBuilder.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ protected ScoreFunction doToFunction(QueryShardContext context) {
9494
try {
9595
ScoreScript.Factory factory = context.getScriptService().compile(script, ScoreScript.CONTEXT);
9696
ScoreScript.LeafFactory searchScript = factory.newFactory(script.getParams(), context.lookup());
97-
return new ScriptScoreFunction(script, searchScript);
97+
return new ScriptScoreFunction(script, searchScript, context.index().getName(), context.getShardId());
9898
} catch (Exception e) {
9999
throw new QueryShardException(context, "script_score: the script could not be loaded", e);
100100
}

0 commit comments

Comments
 (0)