Skip to content

Allow rescorer with field collapsing #107779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 29, 2024
Merged
6 changes: 6 additions & 0 deletions docs/changelog/107779.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 107779
summary: Allow rescorer with field collapsing
area: Search
type: enhancement
issues:
- 27243
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ NOTE: Collapsing is applied to the top hits only and does not affect aggregation
[[expand-collapse-results]]
==== Expand collapse results

It is also possible to expand each collapsed top hits with the `inner_hits` option.
It is also possible to expand each collapsed top hits with the <<inner-hits, `inner hits`>> option.

[source,console]
----
Expand Down Expand Up @@ -86,7 +86,7 @@ GET /my-index-000001/_search

See <<inner-hits, inner hits>> for the complete list of supported options and the format of the response.

It is also possible to request multiple `inner_hits` for each collapsed hit. This can be useful when you want to get
It is also possible to request multiple <<inner-hits, `inner hits`>> for each collapsed hit. This can be useful when you want to get
multiple representations of the collapsed hits.

[source,console]
Expand Down Expand Up @@ -145,8 +145,7 @@ The `max_concurrent_group_searches` request parameter can be used to control
the maximum number of concurrent searches allowed in this phase.
The default is based on the number of data nodes and the default search thread pool size.

WARNING: `collapse` cannot be used in conjunction with <<scroll-search-results, scroll>> or
<<rescore, rescore>>.
WARNING: `collapse` cannot be used in conjunction with <<scroll-search-results, scroll>>.

[discrete]
[[collapsing-with-search-after]]
Expand Down Expand Up @@ -175,6 +174,65 @@ GET /my-index-000001/_search
----
// TEST[setup:my_index]

[discrete]
[[rescore-collapse-results]]
==== Rescore collapse results

You can use field collapsing alongside the <<rescore, `rescore`>> search parameter.
Rescorers run on every shard for the top-ranked document per collapsed field.
To maintain a reliable order, it is recommended to cluster documents sharing the same collapse
field value on one shard.
This is achieved by assigning the collapse field value as the <<search-routing, routing key>>
during indexing:

[source,console]
----
POST /my-index-000001/_doc?routing=xyz <1>
{
"@timestamp": "2099-11-15T13:12:00",
"message": "You know for search!",
"user.id": "xyz"
}
----
// TEST[setup:my_index]
<1> Assign routing with the collapse field value (`user.id`).

By doing this, you guarantee that only one top document per
collapse key gets rescored globally.

The following request utilizes field collapsing on the `user.id`
field and then rescores the top groups with a <<query-rescorer, query rescorer>>:

[source,console]
----
GET /my-index-000001/_search
{
"query": {
"match": {
"message": "you know for search"
}
},
"collapse": {
"field": "user.id"
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase": {
"message": "you know for search"
}
},
"query_weight" : 0.3,
"rescore_query_weight" : 1.4
}
}
}
----
// TEST[setup:my_index]

WARNING: Rescorers are not applied to <<inner-hits, `inner hits`>>.

[discrete]
[[second-level-of-collapsing]]
==== Second level of collapsing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,6 @@ When exposing pagination to users, `window_size` should remain constant as each

Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.

[discrete]
[[learning-to-rank-rescorer-limitations-field-collapsing]]
====== Compatibility with field collapsing

LTR rescorers are not compatible with the <<collapse-search-results, collapse feature>>.

[discrete]
[[learning-to-rank-rescorer-limitations-term-statistics]]
====== Term statistics as features
Expand Down
1 change: 1 addition & 0 deletions rest-api-spec/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ tasks.named("yamlRestTestV7CompatTransform").configure { task ->
task.skipTest("search/370_profile/fetch source", "profile output has changed")
task.skipTest("search/370_profile/fetch nested source", "profile output has changed")
task.skipTest("search/240_date_nanos/doc value fields are working as expected across date and date_nanos fields", "Fetching docvalues field multiple times is no longer allowed")
task.skipTest("search/110_field_collapsing/field collapsing and rescore", "#107779 Field collapsing is compatible with rescore in 8.15")

task.replaceValueInMatch("_type", "_doc")
task.addAllowedWarningRegex("\\[types removal\\].*")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -281,24 +281,6 @@ setup:
- match: { hits.hits.1.fields.numeric_group: [1] }
- match: { hits.hits.1.sort: [1] }

---
"field collapsing and rescore":

- do:
catch: /cannot use \`collapse\` in conjunction with \`rescore\`/
search:
rest_total_hits_as_int: true
index: test
body:
collapse: { field: numeric_group }
rescore:
window_size: 20
query:
rescore_query:
match_all: {}
query_weight: 1
rescore_query_weight: 2

---
"no hits and inner_hits":

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
setup:
- skip:
version: " - 8.14.99"
reason: Collapse with rescore added in 8.15.0
- do:
indices.create:
index: products
body:
mappings:
properties:
product_id: { type: keyword }
description: { type: text }
popularity: { type: integer }

- do:
bulk:
index: products
refresh: true
body:
- '{"index": {"_id": "1", "routing": "0"}}'
- '{"product_id": "0", "description": "flat tv 4K HDR", "score": 2, "popularity": 30}'
- '{"index": {"_id": "2", "routing": "10"}}'
- '{"product_id": "10", "description": "LED Smart TV 32", "score": 5, "popularity": 100}'
- '{"index": {"_id": "3", "routing": "10"}}'
- '{"product_id": "10", "description": "LED Smart TV 65", "score": 10, "popularity": 50}'
- '{"index": {"_id": "4", "routing": "0"}}'
- '{"product_id": "0", "description": "flat tv", "score": 1, "popularity": 10}'
- '{"index": {"_id": "5", "routing": "129"}}'
- '{"product_id": "129", "description": "just a tv", "score": 100, "popularity": 3}'

---
"field collapsing and rescore":
- do:
search:
index: products
body:
query:
bool:
filter:
match:
description: "tv"
should:
script_score:
query: { match_all: { } }
script:
source: "doc['score'].value"
collapse:
field: product_id
rescore:
query:
rescore_query:
script_score:
query: { match_all: { } }
script:
source: "doc['popularity'].value"
query_weight: 0
rescore_query_weight: 1


- match: {hits.total.value: 5 }
- length: {hits.hits: 3 }
- match: {hits.hits.0._id: "3"}
- match: {hits.hits.0._score: 50}
- match: {hits.hits.0.fields.product_id: ["10"]}
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.1._score: 30 }
- match: { hits.hits.1.fields.product_id: ["0"] }
- match: { hits.hits.2._id: "5" }
- match: { hits.hits.2._score: 3 }
- match: { hits.hits.2.fields.product_id: ["129"] }

---
"field collapsing and rescore with window_size":
- do:
search:
index: products
body:
query:
bool:
filter:
match:
description: "tv"
should:
script_score:
query: { match_all: { } }
script:
source: "doc['score'].value"
collapse:
field: product_id
rescore:
window_size: 2
query:
rescore_query:
script_score:
query: { match_all: { } }
script:
source: "doc['popularity'].value"
query_weight: 0
rescore_query_weight: 1
size: 1


- match: {hits.total.value: 5 }
- length: {hits.hits: 1 }
- match: {hits.hits.0._id: "3"}
- match: {hits.hits.0._score: 50}
- match: {hits.hits.0.fields.product_id: ["10"]}
Loading