Skip to content

Ensure vector similarity correctly limits inner_hits returned for nested kNN #111363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benwtrent
Copy link
Member

For nested kNN we support not only similarity thresholds, but also multi-passage search while retrieving more than one nearest passage.

However, the inner_hits retrieved for the kNN search would ignore the restricted similarity. Meaning, the inner hits would return all passages, not just the ones within the limited similarity and this is confusing.

closes: #111093

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent added the auto-backport Automatically create backport pull requests when merged label Jul 29, 2024
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 29, 2024
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've updated the changelog YAML for you.

@benwtrent
Copy link
Member Author

run elasticsearch-ci/packaging-tests-windows-sample

@elasticsearchmachine elasticsearchmachine merged commit 69c9697 into elastic:main Jul 29, 2024
17 checks passed
@benwtrent benwtrent deleted the bugfix/fix-similarity-inner-hit-handling branch July 29, 2024 20:02
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 111363

@benwtrent
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.15

Questions ?

Please refer to the Backport tool documentation

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jul 29, 2024
…ted kNN (elastic#111363)

For nested kNN we support not only similarity thresholds, but also
multi-passage search while retrieving more than one nearest passage.

However, the inner_hits retrieved for the kNN search would ignore the
restricted similarity. Meaning, the inner hits would return all
passages, not just the ones within the limited similarity and this is
confusing.

closes: elastic#111093
(cherry picked from commit 69c9697)
benwtrent added a commit that referenced this pull request Jul 29, 2024
…for nested kNN (#111363) (#111426)

* Ensure vector similarity correctly limits inner_hits returned for nested kNN (#111363)

For nested kNN we support not only similarity thresholds, but also
multi-passage search while retrieving more than one nearest passage.

However, the inner_hits retrieved for the kNN search would ignore the
restricted similarity. Meaning, the inner hits would return all
passages, not just the ones within the limited similarity and this is
confusing.

closes: #111093
(cherry picked from commit 69c9697)

* fixing for backport

* adj for backport

* fix compilation for tests
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 30, 2024
* upstream/main: (105 commits)
  Removing the use of watcher stats from WatchAcTests (elastic#111435)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=UPGRADED} elastic#111434
  Make `EnrichPolicyRunner` more properly async (elastic#111321)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=OLD} elastic#111430
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode KEYWORDs>} elastic#111428
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode TEXTs>} elastic#111429
  Mute org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT elastic#111307
  Update semantic_text field to support indexing numeric and boolean data types (elastic#111284)
  Mute org.elasticsearch.repositories.blobstore.testkit.AzureSnapshotRepoTestKitIT testRepositoryAnalysis elastic#111280
  Ensure vector similarity correctly limits inner_hits returned for nested kNN (elastic#111363)
  Fix LogsIndexModeFullClusterRestartIT (elastic#111362)
  Remove 4096 bool query max limit from docs (elastic#111421)
  Fix score count validation in reranker response (elastic#111212)
  Integrate data generator in LogsDB mode challenge test (elastic#111303)
  ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (elastic#111409)
  [Service Account] Add AutoOps account (elastic#111316)
  [ML] Fix failing test DetectionRulesTests.testEqualsAndHashcode (elastic#111351)
  [ML] Create and inject APM Inference Metrics (elastic#111293)
  [DOCS] Additional reranking docs updates (elastic#111350)
  Mute org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT elastic#111345
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

We ignore similarity when gathering inner_hits for vectors
5 participants