Skip to content

Commit 4380cd1

Browse files
authored
Allow rescorer with field collapsing (#107779)
This change adds the support for rescoring collapsed documents. The rescoring is applied on the top document per group on each shard. Closes #27243
1 parent 30d31bf commit 4380cd1

File tree

14 files changed

+389
-85
lines changed

14 files changed

+389
-85
lines changed

docs/changelog/107779.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 107779
2+
summary: Allow rescorer with field collapsing
3+
area: Search
4+
type: enhancement
5+
issues:
6+
- 27243

docs/reference/search/search-your-data/collapse-search-results.asciidoc

+62-4
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ NOTE: Collapsing is applied to the top hits only and does not affect aggregation
4747
[[expand-collapse-results]]
4848
==== Expand collapse results
4949

50-
It is also possible to expand each collapsed top hits with the `inner_hits` option.
50+
It is also possible to expand each collapsed top hits with the <<inner-hits, `inner hits`>> option.
5151

5252
[source,console]
5353
----
@@ -86,7 +86,7 @@ GET /my-index-000001/_search
8686

8787
See <<inner-hits, inner hits>> for the complete list of supported options and the format of the response.
8888

89-
It is also possible to request multiple `inner_hits` for each collapsed hit. This can be useful when you want to get
89+
It is also possible to request multiple <<inner-hits, `inner hits`>> for each collapsed hit. This can be useful when you want to get
9090
multiple representations of the collapsed hits.
9191

9292
[source,console]
@@ -145,8 +145,7 @@ The `max_concurrent_group_searches` request parameter can be used to control
145145
the maximum number of concurrent searches allowed in this phase.
146146
The default is based on the number of data nodes and the default search thread pool size.
147147

148-
WARNING: `collapse` cannot be used in conjunction with <<scroll-search-results, scroll>> or
149-
<<rescore, rescore>>.
148+
WARNING: `collapse` cannot be used in conjunction with <<scroll-search-results, scroll>>.
150149

151150
[discrete]
152151
[[collapsing-with-search-after]]
@@ -175,6 +174,65 @@ GET /my-index-000001/_search
175174
----
176175
// TEST[setup:my_index]
177176

177+
[discrete]
178+
[[rescore-collapse-results]]
179+
==== Rescore collapse results
180+
181+
You can use field collapsing alongside the <<rescore, `rescore`>> search parameter.
182+
Rescorers run on every shard for the top-ranked document per collapsed field.
183+
To maintain a reliable order, it is recommended to cluster documents sharing the same collapse
184+
field value on one shard.
185+
This is achieved by assigning the collapse field value as the <<search-routing, routing key>>
186+
during indexing:
187+
188+
[source,console]
189+
----
190+
POST /my-index-000001/_doc?routing=xyz <1>
191+
{
192+
"@timestamp": "2099-11-15T13:12:00",
193+
"message": "You know for search!",
194+
"user.id": "xyz"
195+
}
196+
----
197+
// TEST[setup:my_index]
198+
<1> Assign routing with the collapse field value (`user.id`).
199+
200+
By doing this, you guarantee that only one top document per
201+
collapse key gets rescored globally.
202+
203+
The following request utilizes field collapsing on the `user.id`
204+
field and then rescores the top groups with a <<query-rescorer, query rescorer>>:
205+
206+
[source,console]
207+
----
208+
GET /my-index-000001/_search
209+
{
210+
"query": {
211+
"match": {
212+
"message": "you know for search"
213+
}
214+
},
215+
"collapse": {
216+
"field": "user.id"
217+
},
218+
"rescore" : {
219+
"window_size" : 50,
220+
"query" : {
221+
"rescore_query" : {
222+
"match_phrase": {
223+
"message": "you know for search"
224+
}
225+
},
226+
"query_weight" : 0.3,
227+
"rescore_query_weight" : 1.4
228+
}
229+
}
230+
}
231+
----
232+
// TEST[setup:my_index]
233+
234+
WARNING: Rescorers are not applied to <<inner-hits, `inner hits`>>.
235+
178236
[discrete]
179237
[[second-level-of-collapsing]]
180238
==== Second level of collapsing

docs/reference/search/search-your-data/learning-to-rank-search-usage.asciidoc

-6
Original file line numberDiff line numberDiff line change
@@ -64,12 +64,6 @@ When exposing pagination to users, `window_size` should remain constant as each
6464

6565
Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.
6666

67-
[discrete]
68-
[[learning-to-rank-rescorer-limitations-field-collapsing]]
69-
====== Compatibility with field collapsing
70-
71-
LTR rescorers are not compatible with the <<collapse-search-results, collapse feature>>.
72-
7367
[discrete]
7468
[[learning-to-rank-rescorer-limitations-term-statistics]]
7569
====== Term statistics as features

rest-api-spec/build.gradle

+1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ tasks.named("yamlRestTestV7CompatTransform").configure { task ->
8383
task.skipTest("search/370_profile/fetch source", "profile output has changed")
8484
task.skipTest("search/370_profile/fetch nested source", "profile output has changed")
8585
task.skipTest("search/240_date_nanos/doc value fields are working as expected across date and date_nanos fields", "Fetching docvalues field multiple times is no longer allowed")
86+
task.skipTest("search/110_field_collapsing/field collapsing and rescore", "#107779 Field collapsing is compatible with rescore in 8.15")
8687

8788
task.replaceValueInMatch("_type", "_doc")
8889
task.addAllowedWarningRegex("\\[types removal\\].*")

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/110_field_collapsing.yml

-18
Original file line numberDiff line numberDiff line change
@@ -281,24 +281,6 @@ setup:
281281
- match: { hits.hits.1.fields.numeric_group: [1] }
282282
- match: { hits.hits.1.sort: [1] }
283283

284-
---
285-
"field collapsing and rescore":
286-
287-
- do:
288-
catch: /cannot use \`collapse\` in conjunction with \`rescore\`/
289-
search:
290-
rest_total_hits_as_int: true
291-
index: test
292-
body:
293-
collapse: { field: numeric_group }
294-
rescore:
295-
window_size: 20
296-
query:
297-
rescore_query:
298-
match_all: {}
299-
query_weight: 1
300-
rescore_query_weight: 2
301-
302284
---
303285
"no hits and inner_hits":
304286

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
setup:
2+
- skip:
3+
version: " - 8.14.99"
4+
reason: Collapse with rescore added in 8.15.0
5+
- do:
6+
indices.create:
7+
index: products
8+
body:
9+
mappings:
10+
properties:
11+
product_id: { type: keyword }
12+
description: { type: text }
13+
popularity: { type: integer }
14+
15+
- do:
16+
bulk:
17+
index: products
18+
refresh: true
19+
body:
20+
- '{"index": {"_id": "1", "routing": "0"}}'
21+
- '{"product_id": "0", "description": "flat tv 4K HDR", "score": 2, "popularity": 30}'
22+
- '{"index": {"_id": "2", "routing": "10"}}'
23+
- '{"product_id": "10", "description": "LED Smart TV 32", "score": 5, "popularity": 100}'
24+
- '{"index": {"_id": "3", "routing": "10"}}'
25+
- '{"product_id": "10", "description": "LED Smart TV 65", "score": 10, "popularity": 50}'
26+
- '{"index": {"_id": "4", "routing": "0"}}'
27+
- '{"product_id": "0", "description": "flat tv", "score": 1, "popularity": 10}'
28+
- '{"index": {"_id": "5", "routing": "129"}}'
29+
- '{"product_id": "129", "description": "just a tv", "score": 100, "popularity": 3}'
30+
31+
---
32+
"field collapsing and rescore":
33+
- do:
34+
search:
35+
index: products
36+
body:
37+
query:
38+
bool:
39+
filter:
40+
match:
41+
description: "tv"
42+
should:
43+
script_score:
44+
query: { match_all: { } }
45+
script:
46+
source: "doc['score'].value"
47+
collapse:
48+
field: product_id
49+
rescore:
50+
query:
51+
rescore_query:
52+
script_score:
53+
query: { match_all: { } }
54+
script:
55+
source: "doc['popularity'].value"
56+
query_weight: 0
57+
rescore_query_weight: 1
58+
59+
60+
- match: {hits.total.value: 5 }
61+
- length: {hits.hits: 3 }
62+
- match: {hits.hits.0._id: "3"}
63+
- match: {hits.hits.0._score: 50}
64+
- match: {hits.hits.0.fields.product_id: ["10"]}
65+
- match: { hits.hits.1._id: "1" }
66+
- match: { hits.hits.1._score: 30 }
67+
- match: { hits.hits.1.fields.product_id: ["0"] }
68+
- match: { hits.hits.2._id: "5" }
69+
- match: { hits.hits.2._score: 3 }
70+
- match: { hits.hits.2.fields.product_id: ["129"] }
71+
72+
---
73+
"field collapsing and rescore with window_size":
74+
- do:
75+
search:
76+
index: products
77+
body:
78+
query:
79+
bool:
80+
filter:
81+
match:
82+
description: "tv"
83+
should:
84+
script_score:
85+
query: { match_all: { } }
86+
script:
87+
source: "doc['score'].value"
88+
collapse:
89+
field: product_id
90+
rescore:
91+
window_size: 2
92+
query:
93+
rescore_query:
94+
script_score:
95+
query: { match_all: { } }
96+
script:
97+
source: "doc['popularity'].value"
98+
query_weight: 0
99+
rescore_query_weight: 1
100+
size: 1
101+
102+
103+
- match: {hits.total.value: 5 }
104+
- length: {hits.hits: 1 }
105+
- match: {hits.hits.0._id: "3"}
106+
- match: {hits.hits.0._score: 50}
107+
- match: {hits.hits.0.fields.product_id: ["10"]}

0 commit comments

Comments
 (0)