Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized #79

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

liu-cong
Copy link
Contributor

@liu-cong liu-cong commented Dec 9, 2024

  • Also make some fields unexported
  • Also improves logging.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 9, 2024
}

func (c *EndpointSliceReconciler) SetupWithManager(mgr ctrl.Manager) error {
llmServerPoolAvailable := func(object client.Object) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Do not reconcile EndpointSlice until LLMServerPool is initialized

Pods *sync.Map
// poolMu is used to synchronize access to the llmServerPool.
poolMu sync.RWMutex
llmServerPool *v1alpha1.LLMServerPool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree with keeping these properties private. This assumes that access will only ever be from the backend package and I'm not sure that will be true. This package is already rather a catchall and could use some refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that access will only ever be from the backend package and I'm not sure that will be true

Then you can make it public or expose a public method. That's what I am doing here. I have synchronized access to the fields so other packages won't mess up with it.

The bug is that in main.go, it sets a empty LLMServerPool which later in EndpointSlicerReconcilers uses the default port value 0. Making fields private prevents such bugs.

}

func (ds *K8sDatastore) SetLLMServerPool(pool *v1alpha1.LLMServerPool) {
ds.poolMu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we experienced/do we expect a race condition with the serverPool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, happens pretty often actually. If you try restarting EPP many times, you can probably hit it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the startup race condition is annoying, and this PR resolves that in your predicate func. But I think this is unrelated. I suppose its not a big deal, and can keep us safer. Getters/Setters can just add code bloat and can complicate simple logic. Tying this into the export comment thread as well based on the same idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's common practice to hide internal fields until it's necessary to be exported, it reduces chances of bugs. The llmServerPool field is particular important here because I need to synchronize the access to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think thats more Java/C# style, but I won't dig too hard here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So go doesn't encourage the "GetXX" naming, so in this case I should name it "LLMServerPool" instead of "GetLLMServerPool" But Getter/Setter pattern is common in Go as well, see the "Getter" section https://go.dev/doc/effective_go

@liu-cong
Copy link
Contributor Author

liu-cong commented Dec 9, 2024

/hold fixing a test

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2024
@liu-cong
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 10, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Dec 10, 2024

/lgtm

looks ok to me, I will leave the approve to Kellen

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Dec 10, 2024

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 10, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2024
@kfswain
Copy link
Collaborator

kfswain commented Dec 10, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, liu-cong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit bce8d6f into kubernetes-sigs:main Dec 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants