Skip to content

Please make split_on_whitespace configurable #30552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
odmby opened this issue May 13, 2018 · 9 comments
Closed

Please make split_on_whitespace configurable #30552

odmby opened this issue May 13, 2018 · 9 comments
Labels
:Search/Search Search-related issues that do not fall into other categories

Comments

@odmby
Copy link

odmby commented May 13, 2018

In Elastic 6 the default for split_on_whitespace was changed from "true" to "false", while leaving no option for editing the configuration.
We are holding the system upgrade due to this issue, as we have dozens of end users working on Kibana and are copy-pasting large lists of values (100's of items) into the search bar and expecting to get results relevant to all of them (similar to "In" operator in SQL). We also have 100's of visualizations and many dashboards and automatic processes that use this convention.
Default for this option can still be "false" of course, but we just want to option to edit it back to "true".

@jimczi
Copy link
Contributor

jimczi commented May 14, 2018

split_on_whitespace is an option for the query_string query only. It doesn't work with a match query and was the source of all sort of confusions when the analyzer of the field needs to handle multi-words. The switch to a query_string that does not split on whitespace solved this issue. Though there are some valid cases where splitting on whitespace prior to analysis may be needed. For instance if the field is a keyword field you may want to split on whitespace first and apply the normalizer of the field after the split. For this purpose we plan to add the support to split on whitespace at query time for keyword fields but not only for the query_string but for all the analyzed queries (match, multi_match, query_string, simple_query_string). This option could be activated or not in the mapping directly. Would this solution solve your issue with upgrading ?
#30393

@odmby
Copy link
Author

odmby commented May 14, 2018

Thanks for the quick feedback jimczi.
As most of our fields are indeed keywords ("not analyzed") this could solve the issue, but it must work directly from the Kibana search bar, and not only as an optional flag when working directly with elastic.
When is this feature planned to be released? And just to I understand - is this something that needs to be added to each field in each index in the mapping, or is it a general configuration?

@odmby
Copy link
Author

odmby commented May 14, 2018

And will we have to change the mapping of all the existing indexes for this to work?

@jimczi
Copy link
Contributor

jimczi commented May 14, 2018

And just to I understand - is this something that needs to be added to each field in each index in the mapping, or is it a general configuration?
And will we have to change the mapping of all the existing indexes for this to work?

Probably. We still need to discuss internally but the option considered at the moment is to add a split_queries_on_whitespace option in the mapping option of the keyword field which defaults to false.
Though if you rely on whitespace splitting at query time I wonder how do you ensure that you don't index terms with whitespaces in your keyword field ? Could these fields be defined as text field with a keyword index analyzer and a whitespace query analyzer instead ? This would require reindexing but would be more consistent with the real intent. There is no good way to split a keyword field at query time especially when the query is derived from a search bar filled with free text.

@odmby
Copy link
Author

odmby commented May 14, 2018

We do have values with whitespaces, but when querying on them users place quotation marks.
However most of the usage is for lists of values that we know don't have whitespaces (e.g. lists of id's) that are copy-pasted by users to the search bar. We don't have an intention of splitting the keyword field - our queries in kibana are many times more similar to Terms Set Query than to a full text search.

@colings86 colings86 added the :Search/Search Search-related issues that do not fall into other categories label May 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@akotlar
Copy link

akotlar commented May 20, 2018

@jimczi In my case the solution is simply to trim before indexing, and trimming whitespace is probably the most common text-parsing operation I do at least. Not sure I quite understand what you mean "There is no good way to split a keyword field at query time especially when the query is derived from a search bar filled with free text." Isn't the stupidest/easiest option the only one? Just split the tokens, trim, unless they're quoted.

I like your idea of a default false split_on_whitespace option for keyword mappings; alternatively exposing search_mapping on keyword fields for a more flexible transformation.

@javanna
Copy link
Member

javanna commented Jul 3, 2018

ping @elastic/es-search-aggs this may have fallen through the cracks as it was labelled feedback_needed, but I think we got the info we asked for.

@jimczi
Copy link
Contributor

jimczi commented Jul 3, 2018

Thanks for noticing @javanna , the support for the option described in #30393 has been added in #30691. The new option will be available in 6.4 so I am going to close this issue.

@jimczi jimczi closed this as completed Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants