Skip to content

An example of a production cluster configuration. #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 4, 2020
Merged

Conversation

mkuratczyk
Copy link
Collaborator

This example provides a starting point for production deployments. I'm not sure whether the name "production" is a good choice - feel free to suggestion something better given that this is not suitable for all production use cases and we may want to provide other examples in the future (eg. production-XXL).

Closes #398

@mkuratczyk mkuratczyk requested a review from gerhard October 20, 2020 15:24
Copy link
Member

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. TKG in AWS provisions nodes with label topology.kubernetes.io/zone, in case you want to "flavour" the example 😉

Should this PR close #398 ? I think @ChunyiLyu was working on this issue as well.

Edit: I think this should close #397 instead?

@gerhard
Copy link
Contributor

gerhard commented Oct 21, 2020

This is a welcome step in the right direction, thank you @mkuratczyk for taking it.

I feel that some changes are required, as well as sharing more of the reasoning behind them, before this can be merged.

This is what I'm thinking for my next steps:

  1. Add more detail to the private thread that kicked this off. Those that have access to it might want to read for full context.
  2. Commit the latest changes to the StatefulSet that we are using as the sample for the production-ready deployment.
  3. Make specific suggestions to the example proposed in this commit, so that we can merge it. I imagine discussions arising at the previous points, and until we share specific learnings and reach consensus, we cannot complete this step.

Let me know if you think that there is a different approach that would make more sense.

@Zerpet Zerpet linked an issue Oct 22, 2020 that may be closed by this pull request
Copy link
Contributor

@ChunyiLyu ChunyiLyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments

@mkuratczyk mkuratczyk requested review from gerhard and Zerpet November 3, 2020 13:36
Copy link
Contributor

@gerhard gerhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@gerhard gerhard merged commit 3c93b46 into main Nov 4, 2020
@gerhard gerhard deleted the production-example branch November 4, 2020 16:17
@mboutet
Copy link

mboutet commented Nov 30, 2020

I have four questions regarding the production-ready example:

  1. Why is there no PDB? It seems a little odd to not include it since it's paramount to ensure the majority of the node is available in the events of node draining, eviction, upgrade, etc.
  2. What's the reasoning behind cluster_partition_handling = ignore? From my understanding after reading the documentation is that pause_minority would be more appropriate. According to Which Mode to Pick?, it says that the pause_minority mode is ideal when "when clustering across racks or availability zones in a single region" which is what the production-ready example does.
  3. What's the reasoning behind vm_memory_high_watermark_paging_ratio = 0.99? It seems to me that it won't give much chance for rabbitmq to page to disk before the producers are blocked because of the memory alarm.
  4. What's the reasoning behind disk_free_limit.relative = 1.0 when the production checklist recommends disk_free_limit.relative = 1.5?

@mkuratczyk
Copy link
Collaborator Author

For the PodDisruptionBudget - no good reason, I've just added it: #510

@lukebakken
Copy link
Contributor

What's the reasoning behind cluster_partition_handling = ignore?

The RabbitMQ core eng team had a long discussion about this the other day. From what I can remember, in the context of k8s ignore is the "least bad" option. @gerhard , @kjnilsson or @michaelklishin will have a better memory of that discussion.

What's the reasoning behind vm_memory_high_watermark_paging_ratio = 0.99

I don't know off the top of my head but my guess is that is a performance setting (@gerhard ?)

What's the reasoning behind disk_free_limit.relative = 1.0

1.0 is the minimum recommended value from the checklist.

@michaelklishin
Copy link
Contributor

@mboutet

The very first sentence of this PR goes like this: «This example provides a starting point for production deployments…»

There is no way our team can know the realities and needs of your specific production deployment, so just like the Production Checklist guide these are basic guidelines to make
sure a reasonable degree of safety and optimal disk I/O for at least some workloads.

  1. There is no One True Default there. The actual solution would be switch to a Raft-based schema data store (which we have in prototype) and do away with all partition recovery strategies (the system will recover much like any Raft-based system would).
  2. The ratio is compared to the ratio of UsedProcessMemory/Watermark. The value of 0.99 will delay paging for as long as possible. Again, no One True Default here. The default in RabbitMQ itself is 0.5.
  3. I'd say it should be 1.5 but that would potentially greatly overprovision disks for nodes with more memory available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set topologySpreadConstraints by default Document a good starting point for a production deployment
8 participants