Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES|QL Reranker command #123074

Merged
merged 102 commits into from
Apr 4, 2025
Merged

ES|QL Reranker command #123074

merged 102 commits into from
Apr 4, 2025

Conversation

afoucret
Copy link
Contributor

@afoucret afoucret commented Feb 20, 2025

Implementation of the RERANK command

Command syntax

| RERANK "query text" ON title,description WITH `reranker-inference-id`

where

  • query text is a constant string (can use a param instead ?queryText)
  • ON clause contains one or several field separated by comma
    • it is also possible to rename a field here (eg ON title, description=overview)
    • it is also possible to create computed fields here (eg ON title, short_description=SUBSTRING(description,0 100))
  • WITH the name of inference endpoint to be used for reranking. The inference endpoint task type has to be rerank

Done:

  • Grammar and parsing of the command
  • Logical plan
  • Pre-analysis and Analysis
    • Detect missing inference endpoint
    • Ensure all expression are resolved
  • Logical plan optimization
    • Default limits and sort order for the rerank command
  • Physical plan and planing
  • Reranker async operator

Follow-up:

@afoucret afoucret force-pushed the esql-reranker-boostrap branch from 9d2471e to fae22df Compare February 21, 2025 13:42
@afoucret afoucret mentioned this pull request Mar 7, 2025
21 tasks
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments.

).<PreAnalysisResult>andThen((l, enrichResolution) -> resolveFieldNames(parsed, enrichResolution, l));
)
.<PreAnalysisResult>andThen((l, enrichResolution) -> resolveFieldNames(parsed, enrichResolution, l))
.<PreAnalysisResult>andThen((l, preAnalysisResult) -> resolveInferences(preAnalysis.inferencePlans, preAnalysisResult, l));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we parallelized these resolutions, as they're not interdependent. But not necessarily a current conern.

Copy link
Contributor Author

@afoucret afoucret Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tech debt item to the meta-issue for later.

@@ -81,6 +82,7 @@ public class TransportEsqlQueryAction extends HandledTransportAction<EsqlQueryRe
private final AsyncTaskManagementService<EsqlQueryRequest, EsqlQueryResponse, EsqlQueryTask> asyncTaskManagementService;
private final RemoteClusterService remoteClusterService;
private final UsageService usageService;
private final InferenceService inferenceService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only be used in c'tor, can it be local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Good catch/

this.computeService = new ComputeService(
searchService,
transportService,
exchangeService,
enrichLookupService,
lookupFromIndexService,
inferenceService,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new param here, it might worth looking if the ComputeService c'tor shouldn't maybe take a TransportActionServices as param, as they share a few args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did update the ComputeService constructor as described in your comment.

// Ensure the score attribute is present in the output.
if (rerank.scoreAttribute() instanceof UnresolvedAttribute ua) {
Attribute resolved = resolveAttribute(ua, childrenOutput);
if (resolved.resolved() == false || resolved.dataType() != DOUBLE) {
Copy link
Contributor

@bpintea bpintea Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the _score if it is missing so the user doo not have to worry about it

What happens if there's a search function following a RERANK?
Can we add some tests with this too?

he RERANK command overwrite the _score column and sort the result by _score DESC, so having the _score attribute is mandatory.

I guess it might be mandatory for execution, but we could still choose to only keep it projected if the user chooses to. Just like with adding a SORT part of surrogating RERANK, we could choose to finally DROP the _score if not enabled explicitly.

This is somewhat similar to how search functions/operator doe influence the _score, but the user can choose if they want to have it in the output.

@@ -397,6 +397,47 @@ public static boolean clusterHasInferenceEndpoint(RestClient client) throws IOEx
return true;
}

public static void createRerankInferenceEndpoint(RestClient client) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we could set this service in the main() above too, this way one can have a testing cluster prepared "loaded" with that. If possible, this can be a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a follow-up in the meta-issue.

@afoucret afoucret force-pushed the esql-reranker-boostrap branch from 4a5a19e to 132825d Compare April 2, 2025 05:38
@afoucret afoucret requested a review from jimczi April 2, 2025 06:16
@afoucret afoucret removed auto-backport Automatically create backport pull requests when merged v8.19.0 labels Apr 2, 2025
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@afoucret afoucret enabled auto-merge (squash) April 3, 2025 13:39
@afoucret afoucret removed the test-release Trigger CI checks against release build label Apr 4, 2025
@afoucret afoucret disabled auto-merge April 4, 2025 09:14
@afoucret afoucret enabled auto-merge (squash) April 4, 2025 09:54
@afoucret afoucret merged commit a4a2714 into elastic:main Apr 4, 2025
17 checks passed
dnhatn added a commit that referenced this pull request Apr 7, 2025
If the clusters don't support inference test services, skip tests that require inference services. 
Hence, we should check for rerank tests.

Relates #123074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants