Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Kueue Operator API Review Part 2 #2250

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

kannon92
Copy link

@kannon92 kannon92 commented Mar 26, 2025

We have got more requirements for Operator now.

  • ManagedJobWithoutQueueName
  • WaitForPodsReady
  • FairSharing.

Opening this up so we can discuss.

Please ignore the other API fields as these have already been approved by @JoelSpeed.

kannon92 added 19 commits March 19, 2025 09:41

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Copy link
Contributor

openshift-ci bot commented Mar 26, 2025

Hello @kannon92! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@kannon92
Copy link
Author

/hold

@kannon92 kannon92 changed the title Kueue Operator API Review Part 2 DNM: Kueue Operator API Review Part 2 Mar 26, 2025
@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 26, 2025
@openshift-ci openshift-ci bot requested review from deads2k and everettraven March 26, 2025 21:17
Copy link
Contributor

openshift-ci bot commented Mar 26, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kannon92
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

KueueGangSchedulingPolicy *KueueGangSchedulingPolicyOptions `json:"kueueGangSchedulingPolicy,omitempty"`
// fairSharing TODO not done yet
// +optional
FairSharing *FairSharing `json:"fairSharing,omitemoty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be under pre-emption policies / algorithm. Classic as mentioned upstream being the default. Fair sharing / hierarchical being the other options. We need to check if options for the latter two overlap or should be distinct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the TODO means I don't really have this done yet. Wanted to open this up so I can get Joel's thoughts on the other two before Kubecon.

I still need to research fair sharing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs seem to say that they are compatible in v0.11. https://kueue.sigs.k8s.io/docs/concepts/preemption/#fair-sharing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is do we allow all kinds of permutations of strategies?

// +kubebuilder:validation:Enum=LessThanOrEqualToFinalShare;LessThanInitialShare
type PreemptionStrategy string

const (
	LessThanOrEqualToFinalShare PreemptionStrategy = "LessThanOrEqualToFinalShare"
	LessThanInitialShare        PreemptionStrategy = "LessThanInitialShare"
)

type FairSharing struct {
	// preemptionStrategies indicates which constraints should a preemption satisfy.
	// The preemption algorithm will only use the next strategy in the list if the
	// incoming workload (preemptor) doesn't fit after using the previous strategies.
	// Possible values are:
	// - LessThanOrEqualToFinalShare: Only preempt a workload if the share of the preemptor CQ
	//   with the preemptor workload is less than or equal to the share of the preemptee CQ
	//   without the workload to be preempted.
	//   This strategy might favor preemption of smaller workloads in the preemptee CQ,
	//   regardless of priority or start time, in an effort to keep the share of the CQ
	//   as high as possible.
	// - LessThanInitialShare: Only preempt a workload if the share of the preemptor CQ
	//   with the incoming workload is strictly less than the share of the preemptee CQ.
	//   This strategy doesn't depend on the share usage of the workload being preempted.
	//   As a result, the strategy chooses to preempt workloads with the lowest priority and
	//   newest start time first.
	// The default strategy is ["LessThanOrEqualToFinalShare", "LessThanInitialShare"].
	// +kubebuilder:validation:MaxItems=2
	// +optional
	PreemptionStrategies []PreemptionStrategy `json:"preemptionStrategies,omitempty"`
}

options:

option 1: preemptionStrategies: [LessThanOrEqualToFinalShare]
option 2: preemptionStrategies: [LessThanInitialShare]
option 3: preemptionStrategies: ["LessThanOrEqualToFinalShare", "LessThanInitialShare"].
option 4: preemptionStrategies: ["LessThanInitialShare", "LessThanOrEqualToFinalShare"]

Reading the code and API there are a list and kueue applies each stragety in a for loop so the order of these matters for behavior.

IBM uses option 3 and RHOAI disables this for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So classical preemption would be no fair sharing specified.

It looks like FairSharing can be enabled and there there are permutations of these strategies. What I am trying to understand is if this is too complicated to expose to a user.

The docs mention that configuring these stategies requires careful consideration.

This doc explains it okay..https://kueue.sigs.k8s.io/docs/concepts/preemption/#fair-sharing

Cohorts seems to respect fair sharing but it is an alpha feature. It does not seem that any configuration changes are required for fair sharing.

There are new KEPs going on to add more functionality to the fair sharing configuration.

Copy link
Author

@kannon92 kannon92 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of something like:

type Preemption struct {
  FairSharing [Classical | FairSharing]
// We could have ClassicalPremption but there is nothing needed in the configuration. 
}

And we just default the kueue options. Fair sharing is a form of premption in Kueue and I think I can at least explain what FairSharing is.

I can't really make sense of the stragies without writing a lot more info in the go docs.

If we get a request for more detailed policies, we can add them in the enum but right now we can focus on enabling IBM usecase. And not expose all these details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed up a change. PTAL when you can @JoelSpeed?

Our hope is to agree on shape soon and we can iterate on CEL but we'd like to make sure the API shape is looking good.

// EvictNotReadyWorkloads will not block admission but will evict Workloads that are not ready
// within a defaulted timelimit.
// +optional
KueueGangSchedulingPolicy *KueueGangSchedulingPolicyOptions `json:"kueueGangSchedulingPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a discriminated union, so that in the future we can expand the configuration with details like the eviction timeouts where appropriate. For now though, the union would have no members.

So I'm thinking along the lines of

kueueGangScheduling:
  policy: Disabled | BlockAdmission | Workloads
  blockAdmission: # Valid only when policy is BlockAdmission
    ...
  workloads: # Valid only when policy is Workloads
    ...

Now, I've been reading a bit more about the blockAdmission feature

The blockAdmission (waitForPodsReady.blockAdmission) is an optional parameter. When enabled, then the workloads are admitted sequentially to prevent deadlock situations as demonstrated in the example below.

I wonder if we actually want something like

gangScheduling:
  policy: Disabled | ByWorkload
  byWorkload: # Valid only when policy is ByWorkload
    admission: Sequential | Parallel
    ...

Where we would then describe that to an end user as something like:

When admission is set to Sequential, only pods from the currently processing workload will be admitted. Once all pods from the current workload are admitted, and ready, Kueue will process the next workload.
Sequential processing may slow down admission when the cluster has sufficient capacity for multiple workloads, but provides a higher guarantee of workloads scheduling all pods together successfully.
When set to Parallel, pods from any workload will be admitted at any time. This may lead to a deadlock where workloads are in contention for cluster capacity and pods from another workload having successfully scheduled prevent pods from the current workload scheduling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I think that is a good shape.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RHOAI doesn't really have a usecase for Sequential though..

So I am wondering if we just disable it and just have admission: Parallel. (via blockAdmission: false). and do not support/allow blockAdmission: true)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. I know the godocs need some work but I think shape is good..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT @JoelSpeed?

@kannon92 kannon92 requested review from mrunalp and JoelSpeed March 31, 2025 15:34
// spec holds user settable values for configuration
// +required
Spec KueueOperandSpec `json:"spec"`
// status holds observed values from the cluster. They may not be overridden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is KueueSpec an option? It has an operator object in it, and also we use KueueStatus

@haircommander
Copy link
Member

Sorry if my comments miss context, just a fly by review

// pods from another workload having successfully scheduled prevent pods from the current workload scheduling.
// +kubebuilder:validation:XValidation:rule="self.policy==ByWorkload",message="byWorkload is only valid if policy equals ByWorkload"
// +optional
ByWorkload KueueGangSchedulingAdmissionOptions `json:"byWorkload"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we only ever have a static admission option of Sequential or Parallel here? With the Kueue configuration being in beta would it be worth having a proper struct here with a nested admissionPolicy field (or something like that)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only depends on Policy of ByWorkload.

So if Policy is set to ByWorkload, then we can either block admission or allow admission in parallel.

Technically if gang scheduling is disabled, all admission is parallel but there are no evictions of workloads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined that the workload eviction things that are part of the "waitForPodsReady" section in the upstream config would be configured somewhere in here, so I definitely expected this to be a struct. In the future it would probably look like

byWorkload:
  admission: Parallel | Sequential
  timeoutSeconds: <int32>
  requeueStrategy: ...
  recoveryTimeoutSeconds: <int32>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the balance. I have not heard anyone asking for this. In fact, we are talking about pushing these kind of configuration to the workload level rather than a global configuration.

cc @KPostOffice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because someone hasn't asked for this yet doesn't mean we should design the API in a way that prohibits us from doing this in the future. I may need to read up a bit on more recent conversations, but what does moving this to the workload level mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for workloads level, it would mean that a user creating a batch job would be able to specify overrides for these values. It would no longer be necessary to put this on the config level.

And a user could set these up on the workload that they are submitted rather than relying on a global config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why could we just add new fields and require that they are set for ByWorkload and add these to the struct we have.

Why define them all now in case there is a need for it someday? We don't have any explicit customer ask for any of these values so I think leaving it alone for now is sufficient. We can always add a new field to an API. If you want a struct, I can give you an empty struct that we can add to in the future but I think its distracting to try and define these APIs where we have no use case for them yet.

If we need to add new fields, there are in a struct so we can add new features. I don't expect WaitForPodsReady Kueue API to be stable anytime soon so I don't want to expose every field in the Kueueu config API in case there is a need for it.

Copy link
Contributor

@everettraven everettraven Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't add new required fields in the future (unless they are under a new optional field). That is a breaking change.

The point of using a struct is that you have a valid way to expand the options in the future without introducing potentially breaking changes or patterns that don't align with ones that are typically used in OpenShift's API.

It is a bit of an anti-pattern to have a discriminated union that results in you needing to set multiple fields related to the "type".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are optional. We would not change the tags optional but we enforce via validation that these values can be set if ByWorkload is set.

So are you saying something like:

GangScheduling:
  policy: ByWorkload
  byWorkload:
    admission: Parallel

We can always add the extra fields for this if there is a request for it but we at least have the struct structure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bryce and I met offline. I pushed a change to add a byWorkload nested field but I only included admission.

// The borrowable resources are the unused nominal quota
// of all the ClusterQueues in the cohort.
// +optional
PreemptionStrategy PreemptionStrategy `json:"preemptionStrategy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the upstream configuration allows configuring multiple preemption strategies that run in a chain. Are we not planning to allow that and forcing the choice of one or the other?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR it is really hard to explain those policies and the customer ask right now is to enable the kueue defaults for FairSharing. Not to provide all possible items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure that we have a structure that allows us in the future to add those options in a non-breaking way.

If you have to expose those options in the future, how would you add that into this? Would this become a discriminated union?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking I can add a new enum and set the values I need without having to explain the kueue APIs as is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 1 ask for enabling FairSharing from IBM and they are happy with the defaults. If someone really wants to tune it and figure out a better strategy I think we should push them to engage with upstream. And we can always default to a setting in our controller.

I really want to avoid having to explain and document a very complicated and quickly changing part of the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are saying you have to support it now. We are looking to get an idea of how, if a request for this comes in, you plan to enable that functionality in a non-breaking way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone really wants to tune it and figure out a better strategy I think we should push them to engage with upstream

Am I correct in understanding this as essentially boiling down to "if you don't like our opinions, use the upstream version"?

Copy link
Author

@kannon92 kannon92 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. So upstream has this to say about this Fair sharing policies.

Different preemptionStrategies can lead to less or more preemptions under specific scenarios. These are the factors you should consider when configuring preemptionStrategies:

  • Tolerance to disruptions, in particular when single Workloads use a significant amount of the borrowable resources.
  • Speed of convergence, in other words, how important is it to reach a steady fair state as soon as possible.
  • Overall utilization, because certain strategies might reduce the utilization of the cluster in the pursue of fairness.

I was thinking that if a user really needs to configure this, they can figure out how they want it configured via upstream, bring up a RFE and we can support their request. If we expose the enums from Kueue, we are hoping that these fair sharing enums are not going to be deprecated. I wanted a high level API over this so we can default this over releases rather than having a direct api dependency.

I can replicate https://github.com/kubernetes-sigs/kueue/blob/main/apis/config/v1beta1/configuration_types.go#L445 exactly in this API if you think that would be best.

We can add a fair sharing struct here and copy these enums from upstream. And leave it to the cluster admin to figure out the best way to configure this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other complication is https://github.com/kubernetes-sigs/kueue/pull/4252/files.

Looking at this, it actually seems that maybe our Preemption Policy may not be the best name as this would be fair sharing without any prememption.

Comment on lines 247 to 248
// Defaults to QueueNameRequired; therefore, those jobs are not managed and if they are created
// unsuspended, they will start immediately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like what I'd expect of optional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. QueueNameOptional would mean that Kueue will mutate all workloads in the namespace and add a kueue name label.

QueueNameRequired would mean that kueue would only manage a workload that has a certain label. Workloads that do not have this label will be skipped by Kueue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to revisit the naming here as this is a little confusing to me.

I get the name required means that it will only process jobs that have the queue name labelled, but maybe optional is not really saying "manage everything whether it has a label or not"

Do you see what I mean?

// pods from another workload having successfully scheduled prevent pods from the current workload scheduling.
// +kubebuilder:validation:XValidation:rule="self.policy==ByWorkload",message="byWorkload is only valid if policy equals ByWorkload"
// +optional
ByWorkload KueueGangSchedulingAdmissionOptions `json:"byWorkload"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined that the workload eviction things that are part of the "waitForPodsReady" section in the upstream config would be configured somewhere in here, so I definitely expected this to be a struct. In the future it would probably look like

byWorkload:
  admission: Parallel | Sequential
  timeoutSeconds: <int32>
  requeueStrategy: ...
  recoveryTimeoutSeconds: <int32>

Comment on lines +265 to +268
// not fit within the unusued quota, is eligible to issue preemptions
// when the requests of the workload are below the
// resource flavor's nominal quota or borrowWithinCohort is enabled
// on the Cluster Queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one sentence or is there punctuation missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from Kueue upstream as is. It is one setencne.

// The borrowable resources are the unused nominal quota
// of all the ClusterQueues in the cohort.
// +optional
PreemptionStrategy PreemptionStrategy `json:"preemptionStrategy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure that we have a structure that allows us in the future to add those options in a non-breaking way.

If you have to expose those options in the future, how would you add that into this? Would this become a discriminated union?

// The borrowable resources are the unused nominal quota
// of all the ClusterQueues in the cohort.
// +optional
PreemptionStrategy PreemptionStrategy `json:"preemptionStrategy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without omitempty you will need "" as a valid enum option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What option do you recommend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use omitempty

kannon92 added 3 commits April 2, 2025 15:02
kannon92 added 2 commits April 4, 2025 14:01
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments

)

// +kubebuilder:validation:Enum=Parallel;Sequential
type GangSchedulingAdmissionOptions string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

Suggested change
type GangSchedulingAdmissionOptions string
type GangSchedulingWorkloadAdmission string

or just:

Suggested change
type GangSchedulingAdmissionOptions string
type WorkloadAdmission string

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked the first one..

I'm not really sure which one I prefer TBD. I like the more detailed name.

// FairSharing is a more heavy weight algorithm.
// The default is Classical.
// +optional
PreemptionPolicy *PreemptionPolicy `json:"preemptionPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Policy so we don't stutter?

Also, just to clarify what was discussed synchronously - the thinking is that we can add an optional struct in the future that can only be specified when the policy is set to FairSharing if we need support for additional FairSharing configuration in the future.

@JoelSpeed are there any concerns with this that I may have missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think policy would do just fine

Setting this up to be a DU in the future makes sense to me

Also, as this is a string enum, you don't need the pointer as "" is not a valid value for you

Copy link
Contributor

openshift-ci bot commented Apr 7, 2025

@kannon92: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 8741f7b link true /test verify
ci/prow/verify-crd-schema 8741f7b link true /test verify-crd-schema

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +51 to +52
// The default behavior of Kueue will manage workloads that have a queue-name label.
// Workloads that are missing these label will be ignored by Kueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be easier to read as

Suggested change
// The default behavior of Kueue will manage workloads that have a queue-name label.
// Workloads that are missing these label will be ignored by Kueue.
// By default Kueue will only manage workloads that have a queue-name label.
// Workloads that are missing the queue-name label will be ignored by Kueue.

Generally good to avoid implicit statements that assume knowledge of the prior statement when writing technically in docs

Items []Kueue `json:"items"`
}

// +kubebuilder:validation:Enum=BatchJob;RayJob;RayCluster;JobSet;MPIJob;PaddleJob;PytorchJob;TFJob;XGBoostJob;AppWrapper;Pod;Deployment;StatefulSet;LeaderWorkerSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently doesn't match the consts, I think the consts are correct

Suggested change
// +kubebuilder:validation:Enum=BatchJob;RayJob;RayCluster;JobSet;MPIJob;PaddleJob;PytorchJob;TFJob;XGBoostJob;AppWrapper;Pod;Deployment;StatefulSet;LeaderWorkerSet
// +kubebuilder:validation:Enum=BatchJob;RayJob;RayCluster;JobSet;MPIJob;PaddleJob;PyTorchJob;TFJob;XGBoostJob;AppWrapper;Pod;Deployment;StatefulSet;LeaderWorkerSet

// Each period separated segment within the subdomain must start and end with an alphanumeric character.
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Label().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure you won't need the size == 0 check, since you have min length 1

// These keys are persisted to the internal Kueue workload object.
// If not specified, only the Kueue labels will be copied.
// +kubebuilder:validation:MaxItems=64
// +listType=atomic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to make this a map? +listTyeMap, +listMapKey=key

// Kueue provides the ability to admit workloads all in one (gang admission)
// and evicts workloads if they are not ready within a specific time.
// +kubebuilder:validation:XValidation:rule="has(self.policy) && self.policy == 'ByWorkload' ? has(self.byWorkload) : !has(self.byWorkload)",message="byWorkload is required when policy is byWorkload, and forbidden otherwise"
type GangScheduling struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this up as a union please

Suggested change
type GangScheduling struct {
// +union
type GangScheduling struct {

// This may lead to a deadlock where workloads are in contention for cluster capacity and
// pods from another workload having successfully scheduled prevent pods from the current workload scheduling.
// +optional
Admission *GangSchedulingWorkloadAdmission `json:"admission,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a string with an enum, it doesn't need to be a pointer unless the empty string means something (which it doesn't)

// labelPolicy controls whether or not Kueue reconciles
// jobs that don't set the label kueue.x-k8s.io/queue-name.
// The allowed values are QueueName and None.
// None means that workloads will be suspended on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make this clear that this is ALL workloads in a namespace

// ignore this workload.
// Defaults to QueueName.
// +optional
LabelPolicy *LabelPolicy `json:"labelPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a string enum, doesn't need to be a pointer


type Preemption struct {
// preemptionPolicy are the types of preemption kueue allows.
// Kueue has two types of preemption: classical and fair sharing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the enum values

Suggested change
// Kueue has two types of preemption: classical and fair sharing.
// Kueue has two types of preemption: Classical and FairSharing.

// FairSharing is a more heavy weight algorithm.
// The default is Classical.
// +optional
PreemptionPolicy *PreemptionPolicy `json:"preemptionPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think policy would do just fine

Setting this up to be a DU in the future makes sense to me

Also, as this is a string enum, you don't need the pointer as "" is not a valid value for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants