Change APIServer startup message #48

totherme · 2018-03-19T11:47:40Z

We found that the APIServer was not fully up when it logged "Serving
(in)securely on ...". We tried to find a different log line we can wait
for to know when the APIServer is fully up.

The autoregister controller came in with
kubernetes/kubernetes#42732. It seems to be the
last thing to come up before the server is ready to go.

Is there a better thing to wait for, or is this a good indicator that the APIServer is ready to use?

We found that the APIServer was not fully up when it logged "Serving (in)securely on ...". We tried to find a different log line we can wait for to know when the APIServer is fully up. The autoregister controller came in with kubernetes/kubernetes#42732. It seems to be the last thing to come up before the server is ready to go.

hoegaarden · 2018-03-19T14:18:04Z

/test pull-frameworks-test

marun · 2018-03-19T15:32:38Z

I think it would be preferable to do a health check call to the server instead of attempting to parse arbitrary output. This would be consistent with how existing tests work in k8s.io/kubernetes.

We now use the health check API of the APIServer to determine when the APIServer is ready. The implementation in the `processState` gives precedence to the the health check API in favour of the start message -- if you specify both, we only use the endpoint.

totherme · 2018-03-20T16:10:22Z

@marun : PTAL :)

marun

Thank you for your quick response, good progress! I have some suggestions on how it could be improved.

marun · 2018-03-20T16:24:21Z

integration/internal/integration_tests/integration_test.go

@@ -122,3 +121,27 @@ func isSomethingListeningOnPort(hostAndPort string) portChecker {
 		return true
 	}
 }
+
+func CheckAPIServerIsReady(kubeCtl *integration.KubeCtl) {


Why is this check required? It seems more like validation of the underlying kubeapi implementation than of the fixture that is managing a kube-apiserver binary.

We are porting tests in k/k and had an issue there. We took the code from this test and used it in here. This should guard against the same issue re-appearing, even if we change the method by which we check that the APIServer is ready to go.

Consider documenting your rationale in a comment so that this knowledge is available to future maintainers.

marun · 2018-03-20T16:26:15Z

integration/internal/process.go

+	// assume the process is ready to operate. E.g. "/healthz". If this is set,
+	// we ignore StartMessage.
+	HealthCheckEndpoint string
+	// Message to wait for on stderr. If we recieve this message, we assume the


Why is it desirable to still support waiting for a message on stderr? All kube api servers and controllers are likely to have healthcheck endpoints given that is how pod health is generally determined.

We didn't port all the things, only the APIServer, to use that new method

We are not sure if etcd has an health check API (it probably has)

We know that the virtual kubelet with the mock provider does not expose its health via API (see Integration: Introduce notion of Controller Manager & Scheduler #41)

I think it does not hurt or adds too much more complexity

If we find that all the things we want to support have an API and all the processes are ported to use that new mechanism, we are happy to phase out the detection via waiting for a string on stderr.

Consider updating the comment for StartMessage discouraging its use in favor of a health check. It's not the complexity I'm worried about, rather that anyone naively relying on a startup message appearing in stderr is likely to run into the same type of problem that prompted you to create this PR.

Mark as deprecated? (https://blog.golang.org/godoc-documenting-go-code)

marun · 2018-03-20T16:27:37Z

integration/internal/process_test.go

+			})
+
+			It("hits the endpoint, and successfully starts", func() {
+				processState := &ProcessState{}


DRY - Consider creating a function to initialize a ProcessState instance rather than duplicating those details in every test.

We try to keep the tests as easy to grasp as possible and to have them tell a story. That means that we sometimes/often deliberately end up with rather verbose tests and duplication in them.

Having said that, I am happy to revisit this. Especially, but not only, in this case it might be better to abstract all that into a func createSimpleProcessState() *ProcessState or such.

marun · 2018-03-20T16:35:21Z

integration/internal/process.go

+	if ps.HealthCheckEndpoint != "" {
+		healthCheckURL := ps.URL
+		healthCheckURL.Path = ps.HealthCheckEndpoint
+		go pollURLUntilOK(healthCheckURL, ready)


If the start timeout is reached, this goroutine should be terminated.

Right, thanks! 👍

marun · 2018-03-20T16:36:49Z

integration/internal/process.go

+			ready <- true
+			return
+		}
+		time.Sleep(time.Millisecond * 100)


Consider exposing the loop iteration interval as a configurable option alongside timeout to reduce the potential for negative interaction between interval and timeout (e.g. timeout < interval).

It might be worth cribbing from the apimachinery polling code:

https://github.com/kubernetes/apimachinery/blob/master/pkg/util/wait/wait.go#L240

hoegaarden · 2018-03-22T17:15:23Z

@marun , thanks for the great input! PTAL

hoegaarden · 2018-03-22T17:27:35Z

I just recognized that the poll frequency is not configurable yet. I can address this tomorrow.

marun

Most of the changes look good! I have some inline suggestions for simplifying the goroutine handling.

marun · 2018-03-22T17:26:41Z

integration/internal/process_test.go

-			processState.Path = "/nonexistent"
+		Context("when the healthcheck returns ok", func() {
+			BeforeEach(func() {
+				server.RouteToHandler("GET", "/healthz", ghttp.RespondWith(http.StatusOK, ""))


YAGNI - What is the value of putting this in BeforeEach given that there is only a single test? Same comment on some of the other contexts in this file.

This is a pattern we are just used to use over here. I can see that it might look strange and overly complicated / too exhaustive / ... sometimes. Just to give you a bit of context, why we structured the tests as they are:

We have the different Contexts which tell a different (sub)story of what we want to test in this particular case

In each Context we create the state as we described it in the title of the Context, mostly with {Just,}BeforeEach or so

In the It we do the actual test, do what we have described in the title of the It, with the assumption that the context is already prepared as described

So it is not so much about sharing setup code for multiple tests (at least in this example), for us it is more about the story telling (and arguably(?) nicely readable ginkgo output maybe).

If I'd pull the server.RouteToHandler("GET", "/healthz", ...) from the Context's BeforeEach into the It, I'd feel an urge to change the title of the It to something like

hits an endpoint which always returns OK, and successfully starts

Does my long-winded explanation make any sense to you?

It's still not clear to me why some code belongs in BeforeEach and other code does not, and I'm concerned that the distinction is arbitrary. I think separating out code in arbitrary ways imposes costs to readability and maintainability, because anything that's arbitrary is difficult to communicate or have common standards for. I'm not going to force you to change, I'm just providing my perspective.

marun · 2018-03-22T17:28:34Z

integration/internal/process.go

+
+	for {
+		select {
+		case <-time.Tick(time.Millisecond * 100):


The following comment remains unresolved: #48 (comment)

I'm also not sure of the value of adding indirection to such a simple function. Consider simplifying:

for { res, err := http.Get(url.String()) if err == nil && res.StatusCode == http.StatusOK { ready <- true return } select { case <-stopCh: return default: time.Sleep(pollInterval) } }

marun · 2018-03-22T17:41:03Z

integration/internal/process.go

+	return io.MultiWriter(safeWriters...)
+}
+
+func pollURLUntilOK(url url.URL, ready chan bool, timedOut <-chan time.Time) {


(No action required) The name timedOut presumes the reason this function should exit. Consider using a name that is more functional (e.g. stopCh is a common convention in k/k).

marun · 2018-03-22T17:56:42Z

integration/internal/process.go

 		return fmt.Errorf("timeout waiting for process %s to start", path.Base(ps.Path))
 	}
 }

+func teeTimer(src <-chan time.Time) (chan time.Time, chan time.Time) {


I think this approach is a bit heavyweight if the goal is providing a stop channel to the goroutine. Consider the following example:

var stopCh chan struct{} ... stopCh = make(chan struct{}) go pollURLUntilOK(healthCheckURL, ready, stopCh) ... select { ... case <-timedOut: if stopCh != nil { close(stopCh) } }

marun · 2018-03-22T17:57:04Z

integration/internal/integration_tests/integration_test.go

@@ -122,3 +121,40 @@ func isSomethingListeningOnPort(hostAndPort string) portChecker {
 		return true
 	}
 }
+
+// CheckAPIServerIsReady checks if the APIServer is really ready and not only


marun · 2018-03-22T17:57:16Z

integration/internal/process.go

+	// message, we assume the process is ready to operate. Ignored if
+	// HealthCheckEndpoint is specified.
+	//
+	// The usage of StartMessage is discouraged, favour HealthCheckEndpoint


marun · 2018-03-22T18:06:12Z

integration/internal/process_test.go

+			})
+			It("returns a timeout error and stops health API checker", func() {
+				processState.HealthCheckEndpoint = "/healthz"
+				processState.StartTimeout = 500 * time.Millisecond


(No action required) Consider using a variable or constant in place of a magic number where possible. Naming a value can document its purpose and simplify reuse across tests:

shortTimeout = 100 * time.Millisecond longTimeout = 500 * time.Millisecond foreverTimeout = 20 * time.Second

The decission was made, that the defaulting is not done in the `DoDefaulting` function, as this does not apply for each process (as off now). In future we might consider splitting the `DoDefaulting` function into multiple functions and apply only certain on per process.

hoegaarden · 2018-03-23T11:39:27Z

@marun, thanks for the great review, comments & tips! Let me know what you think about the PR now.

/test pull-frameworks-test

marun · 2018-03-23T17:51:49Z

/lgtm

k8s-ci-robot · 2018-03-23T17:51:56Z

@marun: changing LGTM is restricted to assignees, and only kubernetes-sigs org members may be assigned issues.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

marun · 2018-03-23T18:46:16Z

/lgtm

k8s-ci-robot · 2018-03-23T18:46:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marun, totherme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [totherme]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

logicalhan · 2018-10-12T17:08:48Z

integration/internal/process_test.go

+
+			Context("when the polling interval is not configured", func() {
+				It("uses the default interval for polling", func() {
+					processState.HealthCheckEndpoint = "/helathz"


totherme requested a review from apelisse March 19, 2018 11:47

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2018

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 20, 2018

marun suggested changes Mar 20, 2018

View reviewed changes

hoegaarden added 4 commits March 22, 2018 15:34

Simplify ProcessState initialization

320842d

Stop poller go routine when timeout expired

d8db11d

Document thati use of StartMessage is discouraged

151bdde

Document the rationale behind CheckAPIServerIsReady

a0f4835

marun reviewed Mar 22, 2018

View reviewed changes

hoegaarden added 3 commits March 23, 2018 09:56

Simplify the poller handling and cancelation

36d7cad

Simplify tests a bit by pulling out URL handling

49018b7

k8s-ci-robot assigned marun Mar 23, 2018

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 23, 2018

marun approved these changes Mar 23, 2018

View reviewed changes

k8s-ci-robot merged commit 2909416 into kubernetes-retired:master Mar 23, 2018

logicalhan reviewed Oct 12, 2018

View reviewed changes

logicalhan mentioned this pull request Dec 8, 2018

REQUEST: New membership for @logicalhan kubernetes/org#292

Closed

6 tasks

Change APIServer startup message #48

Change APIServer startup message #48

Conversation

totherme commented Mar 19, 2018

hoegaarden commented Mar 19, 2018

marun commented Mar 19, 2018

totherme commented Mar 20, 2018

marun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hoegaarden commented Mar 22, 2018

hoegaarden commented Mar 22, 2018

marun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marun Mar 22, 2018 • edited Loading

Choose a reason for hiding this comment

hoegaarden commented Mar 23, 2018

marun commented Mar 23, 2018

k8s-ci-robot commented Mar 23, 2018

marun commented Mar 23, 2018

k8s-ci-robot commented Mar 23, 2018

Choose a reason for hiding this comment

marun Mar 22, 2018 •

edited

Loading