-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add the ability for Elasticsearch to calculate and index the length of a string field #65636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-search (Team:Search) |
We have an official plugin that does exactly that: |
My understanding of @webmat's proposal is to calculate the length of the individual fields versus the size of the |
Oh , I misread the issue thanks. |
Wouldn't this be made possible by #68984 ? Once an indexed field supports defining a script which is executed at index time, you can calculate the length of the field and index it straight-away. Am I missing something? |
This issue suggests creating a new field type, or automatically indexing the length of a field when requested. Like mentioned above, this could be obtained with a runtime field, or by specifying a script for a long field that calculates the length of a given indexed field loaded from doc_values. With that I am closing this issue, feel free to reopen or add comments if I missed something. |
I would love to have the option of having Elasticsearch calculate the length of a string field, via a mapping setting.
There are many situations where having the string length is useful. The one I’m most interested in is the first:
Calculating field length via a runtime field can satisfy the need of displaying field length of a document pulled up another way. However each of the situation above needs the length indexed explicitly, if we want to avoid prohibitively expensive queries.
It’s currently possible to do this by adding a sister field (e.g.
dns.question.name
=>dns.question.name_length
), then calculating the length upon ingestion with an ingest node processor or other method. However this approach potentially leads to boilerplate code that needs to be repeated in many pipelines.I’m thinking of two ways this could potentially be implemented. I’d be happy with either:
dns.question.name
=>dns.question.name.length
)A recent ECS discussion on DNS question/answer length (ecs#992) was the inspiration for this. If we had such a capability, we would potentially add such a “length” field a few more places in ECS: DNS, URLs, user agents, process.command_line, etc.
The text was updated successfully, but these errors were encountered: