Skip to content

🌱 Add automated machine management section to docs tasks #6421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

enxebre
Copy link
Member

@enxebre enxebre commented Apr 18, 2022

What this PR does / why we need it:
Add automated worker machine management section to docs tasks

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 18, 2022
@enxebre enxebre force-pushed the automated-machine-management-docs branch 4 times, most recently from 57b275b to acfbabe Compare April 18, 2022 15:48
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @enxebre for improving our docs!

- [Automated worker Machine management](./tasks/automated-worker-machine-management/index.md)
- [Scaling](./tasks/automated-worker-machine-management/scaling.md)
- [Auto scaling](./tasks/automated-worker-machine-management/auto-scaling.md)
- [Health checking](./tasks/automated-worker-machine-management/healthchecking.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health checking is both for control plane and worker nodes...
What about calling this new section "Automated machine management" and explicitly calling out when a paragraph applies only to a subset of machines

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only small formatting nits 👍


Machines can be owned by scalable resources i.e. MachineSet and MachineDeployments.

You can scale MachineSets and MachineDeployments in or out by expressing intent via .spec.replicas or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can scale MachineSets and MachineDeployments in or out by expressing intent via .spec.replicas or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.
You can scale MachineSets and MachineDeployments in or out by expressing intent via `.spec.replicas` or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.

Formatting to follow the other book entries/markdowns

You can scale MachineSets and MachineDeployments in or out by expressing intent via .spec.replicas or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.

When you delete a Machine directly or by scaling down, the same process takes place in the same order:
- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a .spec.nodeDrainTimeout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a .spec.nodeDrainTimeout.
- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a `.spec.nodeDrainTimeout.`

Formatting to follow the other book entries/markdowns


When you delete a Machine directly or by scaling down, the same process takes place in the same order:
- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a .spec.nodeDrainTimeout.
- CAPI uses default [kubectl draining implementation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) with –ignore-daemonsets=true. If you needed to ensure DaemonSets eviction you'd need to do so manually by also adding proper taints to avoid rescheduling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- CAPI uses default [kubectl draining implementation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) with ignore-daemonsets=true. If you needed to ensure DaemonSets eviction you'd need to do so manually by also adding proper taints to avoid rescheduling.
- CAPI uses default [kubectl draining implementation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) with `--ignore-daemonsets=true`. If you needed to ensure DaemonSets eviction you'd need to do so manually by also adding proper taints to avoid rescheduling.

Formatting to follow the other book entries/markdowns

- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a .spec.nodeDrainTimeout.
- CAPI uses default [kubectl draining implementation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) with –ignore-daemonsets=true. If you needed to ensure DaemonSets eviction you'd need to do so manually by also adding proper taints to avoid rescheduling.
- The infrastructure backing that Node will try to be deleted indefinitely.
- Only when the infrastructure is gone, the Node will try to be deleted indefinitely unless you specify spec.nodeDeletionTimeout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Only when the infrastructure is gone, the Node will try to be deleted indefinitely unless you specify spec.nodeDeletionTimeout.
- Only when the infrastructure is gone, the Node will try to be deleted indefinitely unless you specify `.spec.nodeDeletionTimeout`.

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good, only a small nit that could be fixed:

@@ -0,0 +1,13 @@
# Scaling Nodes

You can add or remove compute capacity for your cluster workloads by creating or removing Machines. A Machine express intent to have a Node with a defined form factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can add or remove compute capacity for your cluster workloads by creating or removing Machines. A Machine express intent to have a Node with a defined form factor.
You can add or remove compute capacity for your cluster workloads by creating or removing Machines. A Machine expresses intent to have a Node with a defined form factor.

@enxebre enxebre force-pushed the automated-machine-management-docs branch from acfbabe to be84108 Compare April 21, 2022 20:15
@enxebre
Copy link
Member Author

enxebre commented Apr 21, 2022

Addressed all comments PTAL @fabriziopandini @furkatgofurov7 @chrischdi.

@enxebre enxebre force-pushed the automated-machine-management-docs branch from be84108 to f35d524 Compare April 21, 2022 20:23
@enxebre enxebre changed the title 🌱 Add automated worker machine management section to docs tasks 🌱 Add automated machine management section to docs tasks Apr 21, 2022
@furkatgofurov7
Copy link
Member

/retest


Machines can be owned by scalable resources i.e. MachineSet and MachineDeployments.

You can scale MachineSets and MachineDeployments in or out by expressing intent via `.spec.replicas or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can scale MachineSets and MachineDeployments in or out by expressing intent via `.spec.replicas or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.
You can scale MachineSets and MachineDeployments in or out by expressing intent via `.spec.replicas` or updating the scale subresource e.g `kubectl scale machinedeployment foo --replicas=5`.

@enxebre enxebre force-pushed the automated-machine-management-docs branch from f35d524 to b253ea8 Compare April 22, 2022 18:43
Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enxebre, LGTM! lint GHA seems flaky though

@furkatgofurov7
Copy link
Member

/retest

@enxebre enxebre force-pushed the automated-machine-management-docs branch from b253ea8 to f067caa Compare April 25, 2022 14:10
@enxebre
Copy link
Member Author

enxebre commented Apr 25, 2022

lint GHA seems flaky though

rebased to pick #6436

@sbueringer
Copy link
Member

Nice improvement, thx!!
/lgtm

1 similar comment
@sbueringer
Copy link
Member

Nice improvement, thx!!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2022
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just a couple of nits really. Definitely a useful subsection to have here.

- The Node backed by that Machine will try to be drained indefinitely and will wait for any volume to be detached from the Node unless you specify a `.spec.nodeDrainTimeout`.
- CAPI uses default [kubectl draining implementation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) with `-–ignore-daemonsets=true`. If you needed to ensure DaemonSets eviction you'd need to do so manually by also adding proper taints to avoid rescheduling.
- The infrastructure backing that Node will try to be deleted indefinitely.
- Only when the infrastructure is gone, the Node will try to be deleted indefinitely unless you specify `.spec.nodeDeletionTimeout`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note saying this won't work with a Topology managed Cluster with a ref to the ClusterClass doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are slowly adding all the flags to topology managed clusters, let's aim for feature parity

Copy link
Member

@sbueringer sbueringer May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if we don't have a note here if it's supported with ClusterClass or not. (especially given that Nabarun will implement #6450 (talked to him last week about it))

@enxebre enxebre force-pushed the automated-machine-management-docs branch from f067caa to 1f8321f Compare May 26, 2022 09:10
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2022
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 26, 2022
@enxebre enxebre force-pushed the automated-machine-management-docs branch from 1f8321f to f9546b8 Compare May 26, 2022 09:11
@enxebre
Copy link
Member Author

enxebre commented May 26, 2022

PTAL @sbueringer @killianmuldoon

@killianmuldoon
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2022
@sbueringer
Copy link
Member

Thank you!

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit f7e1205 into kubernetes-sigs:main May 27, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants