-
Notifications
You must be signed in to change notification settings - Fork 817
Query Frontend scalability #1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There are, like, existing things that do queuing? |
Yeah, I briefly considered that in the design doc (https://docs.google.com/document/d/1lsvSkv0tiAMPQv-V8vI2LZ8f4i9JuTRsuPI_i-XcAqY/edit). Generally I couldn't see how I could use one to achieve the QoS algorithm I wanted - ie picking a random queue to balance queries between customers. Suggestions welcome! |
/cc @cyriltovena @owen-d |
We have an accepted proposal here #2528. |
Heres some more discussion of how we might do this: https://docs.google.com/document/d/1Io-KztXxVxvA764_HSmRV3ibKp7fh-LE0NUIaiOwCn4/edit?usp=sharing |
We're staring to have enough sustained query load that the limit of having two query frontends is hurting us. The query frontends currently have to unmarshal the JSON from the querier (and the proto from the cache), combine them and then remarshal and send them.
The limit of two query frontends is so the queueing is fair. The more replicas you add, the more queues you'll end up with and the more it will degrade to random load balancing, the problem the frontend was designs to solve.
One idea is to put the queue in a separate service. This service would be responsible for queueing, scheduling, retires etc. The frontend would then only be responsible for accepting the queries, consulting the cache, working out what to enqueue, accepting the response, writing back to the cache and the user. This would allow the frontend to be stateless and scalable. The queries would communicate directly with the frontend service for the bulk of the response; the new queueing service would be control-path only.
WDYT?
The text was updated successfully, but these errors were encountered: