-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Query documents before rollup #38837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-analytics-geo |
@TheBronx thanks for opening this issue. From what I understand so far, what you are trying to do can already be achieved using Filtered Aliases. You would define different aliases for your subset of documents and then point the rollup job to those. I haven't tried this in practice though, maybe @polyfractal has ideas about this or knows alternative approaches? |
Okay, it actually works! |
Great to hear, maybe there are even simpler ways that @polyfractal knows about, so lets wait a bit for his thoughts but I think after that we can close then. |
Filtered aliases would be the best (and I think only) way to do it right now. We made a decision to not allow filtering on the rollup job itself, to prevent a "mismatch" between the input data and the output rollup data. E.g. it might be confusing for a user consuming rollup data to see data missing, if they aren't aware that the job itself was filtered. We may loosen that restriction in the future. But until then, a filtered alias would be the best way to do it.
That's correct, the alias itself is essentially free, so the only extra cost is adding the filter itself :) |
It is me again, I just found a problem with this approach 😢
And that is exactly what I did, cause I am using logstash: I would have to recreate the alias everyday for this approach to work right? Maybe this is not the best way to do it 😆 It was too good to be true. Any other ideas? |
I agree with @TheBronx, this is really a missing feature, very useful. BTW, in Data Transforms, we can define a query. So it would be coherent to have also this option in rollup jobs. |
I'm going to re-open this ticket as a placeholder. We're working on a big refactor of Rollup (changing how search works, integrating with ILM, etc) so this request is something we can reconsider in light of the new framework. It's a fairly common request so far over the lifetime of Rollup v1. That said, I think a lot of the difficulties remain; could be trappy for the "consumer" of the rollup data if they don't know it has been filtered, and I'm not sure how it would work/look under the new setup. But now's the time to think through those things, hence the re-open :) |
Thanks for re-open! |
Hi everyone, +1 for this feature as well. |
Describe the feature:

When rolling up data, it would be nice to filter documents with a query. That is, instead of rolling up all documents on an index (or index pattern), aggregate only those that match the query.
The reason behind this is that once you rollup data, you cannot query it, and it would be probably too complex to store aggregated data in a way that supports certain queries. But filtering during the rollup job should be "easier" (I hope!) and that would be really useful.
For example, if we are storing HTTP requests on an index, we could create a few rollup jobs:
(Each of these rollup jobs would go to a different rollup index of course)
This would be so powerful! What do you think?
Thank you!
I have found another issue here that sounds similar to me, but I am not sure so please feel free to close this one if the idea behind is the same: #34921
I also posted this on your discourse: https://discuss.elastic.co/t/filtering-documents-for-rollup/167417
The text was updated successfully, but these errors were encountered: