Skip to content

ESQL: Add a pre-mapping logical plan processing step #120368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jan 17, 2025

WIP

This adds a processing step that runs rules which can modify an optimized LogicalPlan by executing async calls to the services avaialble to ESQL's transport query action. The rules need to implement a new interface (MappingPreProcessor). The resulting plan won't be further optimized logically, but mapped and further processed as a physical plan.

This adds a processing step that runs rules which can modify an
optimized LogicalPlan by executing async calls to the services avaialble
to ESQL's transport query action. The rules need to implement a new
interface (`MappingPreProcessor`). The resulting plan won't be further
optimized logically, but mapped and further processed as a physical plan.
@bpintea bpintea added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@bpintea bpintea marked this pull request as ready for review January 20, 2025 14:34
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a first round of comments.

}

return queryAsObject;
return BytesRefs.toString(queryAsObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Comment on lines 67 to 74
ResolvedIndices resolvedIndices = ResolvedIndices.resolveWithIndexNamesAndOptions(
indexNames.toArray(String[]::new),
IndexResolver.FIELD_CAPS_INDICES_OPTIONS,
services.clusterService().state(),
services.indexNameExpressionResolver(),
services.transportService().getRemoteClusterService(),
System.currentTimeMillis()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices have already been resolved through field-caps; double check if this information can be reused instead of going back to string format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolvedIndices splits the indices by local and remote and uses the string format for that. Not sure if/how we can simplify this.

* {@link FullTextFunctionMapperPreprocessor#preprocess(LogicalPlan, TransportActionServices, ActionListener)} will rewrite the plan by
* replacing {@link FullTextFunction} expression with new ones that hold rewritten {@link QueryBuilder}s.
*/
public class FullTextFunctionMapperPreprocessor implements MappingPreProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses a pull approach where a built-in pre-processor searches for FullTextFunctions before rewriting the query.
A push approach would be better, similar to the Aware interfaces: can the TranslationAware be used instead or potentially an extension to it.
Specifically this stage is important only for queries that depend on QueryRewriteContext (not its subclasses) - creating this context is not trivial so checking whether this is needed is valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked this part, to have the FullTextFunctions return the preprocessor as a singleton. This way the preprocessors are "discovered" in the plan.

);
}

public LogicalPlan updateQueryBuilders(LogicalPlan plan, Map<FullTextFunction, QueryBuilder> newQueryBuilders) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the rewrite applied on the spot and instead an associated map is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked this bit. But my guess is that the intention was to avoid multiple tree passes.

* Note that the LogicalPlan following the rules' changes will not undergo another logical optimization round. The changes these rules
* should apply are only those that require access to services that need to be performed asynchronously.
*/
public interface MappingPreProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be multiple preprocessors?
Why not have a dedicated "stage" such as preMapping() (similar to preAnalyze) and encapsulate the logic there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be multiple preprocessors?

That was my assumption, yes.

Why not have a dedicated "stage" such as preMapping() (similar to preAnalyze) and encapsulate the logic there?

There are some particularities to how queries are repeatedly rewritten (and stopping that is done by identity checking of a return object) -- these should remain particular to FullTextFunction IMO and this logic stay within a dedicated class.

But the execution of the preprocessor is done at a dedicated stage.

Not sure if the current format meets your expectation, but happy to shape it further if it can be done better.

Copy link
Contributor Author

@bpintea bpintea Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are needed to make the FullTextFunctions instances distinguishable by their SemanticQueryBuilder field having or not an inferenceResultsSupplier; which is needed to update the query tree in place.

@bpintea
Copy link
Contributor Author

bpintea commented Jan 30, 2025

Deprecated by a simpler #121260.

@bpintea bpintea closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants