Skip to content

Define new selector for building image job #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2023

Conversation

erusso7
Copy link
Contributor

@erusso7 erusso7 commented Feb 23, 2023

@k8s-ci-robot k8s-ci-robot requested a review from ybettan February 23, 2023 16:19
@netlify
Copy link

netlify bot commented Feb 23, 2023

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit b6fb69c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kmm/deploys/643d4453477aaf000847cca3
😎 Deploy Preview https://deploy-preview-311--kubernetes-sigs-kmm.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 23, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @erusso7. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 23, 2023
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.03 ⚠️

Comparison is base (d8ed348) 82.27% compared to head (b6fb69c) 82.25%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #311      +/-   ##
==========================================
- Coverage   82.27%   82.25%   -0.03%     
==========================================
  Files          31       31              
  Lines        3075     3082       +7     
==========================================
+ Hits         2530     2535       +5     
- Misses        448      450       +2     
  Partials       97       97              
Impacted Files Coverage Δ
api/v1beta1/module_types.go 100.00% <ø> (ø)
internal/build/job/maker.go 90.49% <100.00%> (+0.17%) ⬆️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -129,6 +129,11 @@ func (m *maker) specTemplate(
kanikoImage += ":" + buildConfig.KanikoParams.Tag
}

nodeSelector := mld.Selector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to set the nodeSelector at all for Builds? Why not let kubernetes decide where to run the Build job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Removing it completely will also solve #140 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let kubernetes decide where to run the Build job?

This comes from that issue. You may want to reserve nodes equipped with expensive hardware for some workloads only, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see in that issue any mention of nodes with expensive hardware. They just say that it would be nice if build/sign can also run on other nodes, and not only those with specific hardware. Also, allowing kuberentes to pick the running node makes use of the schedule, which probably has much more data to decide which nodes are less "overloaded"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have understood the issue the same way as @yevgeny-shnaidman did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, allowing kuberentes to pick the running node makes use of the schedule, which probably has much more data to decide which nodes are less "overloaded"

This change makes a user able to do just that, with selector: {} in the build section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but why give him an option at all?

@ybettan
Copy link
Contributor

ybettan commented Feb 27, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 27, 2023
@ybettan
Copy link
Contributor

ybettan commented Feb 27, 2023

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 28, 2023
@qbarrand qbarrand linked an issue Mar 1, 2023 that may be closed by this pull request
@qbarrand
Copy link
Contributor

qbarrand commented Mar 1, 2023

/hold
This change may brake flows for users building for a specific architecture in multi-arch clusters.
Let us consider a cluster with amd64 and arm64 nodes.
Before this change, if you created a Module targeting amd64 nodes, the builds would be automatically scheduled on amd64 nodes.
With this change, the build may run on an arm64 node, which by default will produce a module for arm64, not amd64.

cc @hershpa @yevgeny-shnaidman @ybettan

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2023
@yevgeny-shnaidman
Copy link
Contributor

/hold This change may brake flows for users building for a specific architecture in multi-arch clusters. Let us consider a cluster with amd64 and arm64 nodes. Before this change, if you created a Module targeting amd64 nodes, the builds would be automatically scheduled on amd64 nodes. With this change, the build may run on an arm64 node, which by default will produce a module for arm64, not amd64.

cc @hershpa @yevgeny-shnaidman @ybettan

@qbarrand maybe all we need to add a label to the Build/Sign job node selector is architecture label. I think that all the nodes are label with it anyways, and it can be derived from the KernelMapping

@ybettan
Copy link
Contributor

ybettan commented Mar 1, 2023

This change may brake flows for users building for a specific architecture in multi-arch clusters.
Let us consider a cluster with amd64 and arm64 nodes.
Before this change, if you created a Module targeting amd64 nodes, the builds would be automatically scheduled on amd64 nodes.
With this change, the build may run on an arm64 node, which by default will produce a module for arm64, not amd64.

@qbarrand @yevgeny-shnaidman I would instead just ask kaniko to build for a specific arch (regardless of the host) using https://github.com/GoogleContainerTools/kaniko#flag---customplatform.

This will also allow to build for ARM in a x86 cluster or any other combination. This is especially handy in the hub-spoke topology.

This will requires extending the build section in our CRD though.

@yevgeny-shnaidman
Copy link
Contributor

This change may brake flows for users building for a specific architecture in multi-arch clusters.
Let us consider a cluster with amd64 and arm64 nodes.
Before this change, if you created a Module targeting amd64 nodes, the builds would be automatically scheduled on amd64 nodes.
With this change, the build may run on an arm64 node, which by default will produce a module for arm64, not amd64.

@qbarrand @yevgeny-shnaidman I would instead just ask kaniko to build for a specific arch (regardless of the host) using https://github.com/GoogleContainerTools/kaniko#flag---customplatform.

This will also allow to build for ARM in a x86 cluster or any other combination. This is especially handy in the hub-spoke topology.

This will requires extending the build section in our CRD though.

@ybettan does kaniko support something like that? build Dockerfile will probably include Makefile that needs to know how to cross-compile, not sure if customer will want to support it.

@ybettan
Copy link
Contributor

ybettan commented Mar 2, 2023

This PR is blocked by #325 unless we modify the CRD to contain a nodeSelector for build/sign which is not ideal as we wish not to extend the API unless there is a requirement for it.

@qbarrand
Copy link
Contributor

qbarrand commented Mar 8, 2023

we wish not to extend the API unless there is a requirement for it

Sounds like it's the only way to properly address #140 though.

@ybettan
Copy link
Contributor

ybettan commented Mar 13, 2023

@qbarrand
We have kubernetes.io/arch: amd64 on the nodes, how about we just set the inner selector of the build to match the same arch as of the module's selector node?

@ybettan
Copy link
Contributor

ybettan commented Mar 22, 2023

As agreed in the community meeting we will proceed with the following:

We add a new selector for build. Regarding the default value we differentiate between v1 and v2.

v1:
If the build selector is not set, we use the selector from the module to maintain the same behavior that we had until now.

v2:
If the build selector is not set, we do not set any selector for the build job and let k8s make the scheduling.

@erusso7 erusso7 force-pushed the separate-selectors branch from 9514d5a to f4c8942 Compare March 31, 2023 13:58
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 31, 2023
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 31, 2023

@erusso7: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kernel-module-management-test-deploy 9514d5a link true /test pull-kernel-module-management-test-deploy

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@erusso7 erusso7 marked this pull request as draft March 31, 2023 14:15
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2023
@erusso7 erusso7 force-pushed the separate-selectors branch 2 times, most recently from cc8dd8c to 13cb6dc Compare April 5, 2023 13:18
@erusso7 erusso7 marked this pull request as ready for review April 5, 2023 18:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2023
@k8s-ci-robot k8s-ci-robot requested review from qbarrand and ybettan April 5, 2023 18:12
@erusso7 erusso7 force-pushed the separate-selectors branch from 13cb6dc to b6fb69c Compare April 17, 2023 13:06
@ybettan
Copy link
Contributor

ybettan commented Apr 17, 2023

/unhold
/lgtm
/assign @qbarrand @yevgeny-shnaidman

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2023
@yevgeny-shnaidman
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: erusso7, yevgeny-shnaidman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2023
@k8s-ci-robot k8s-ci-robot merged commit bb7b4e4 into kubernetes-sigs:main Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate node selector for build process
6 participants