Skip to content

"Kubectl client Simple pod should support inline execution and attach" failing at HEAD against a 1.1 cluster #19715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ikehz opened this issue Jan 15, 2016 · 37 comments
Assignees
Labels
area/test area/upgrade priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Milestone

Comments

@ikehz
Copy link
Contributor

ikehz commented Jan 15, 2016

Kubectl client Simple pod should support inline execution and attach

_/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:452 Expected <_errors.StatusError | 0xc208249700>: { ErrStatus: { TypeMeta: {Kind: "", APIVersion: ""}, ListMeta: {SelfLink: "", ResourceVersion: ""}, Status: "Failure", Message: "no kind "DeleteOptions" is registered for version "extensions/v1beta1"", Reason: "", Details: nil, Code: 500, }, } to be nil*

Googlers see: http://kubekins.dls.corp.google.com/job/kubernetes-upgrade-gke-1.1-master-step2-kubectl-e2e-new/

@ikehz ikehz added area/test priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. area/upgrade labels Jan 15, 2016
@ikehz
Copy link
Contributor Author

ikehz commented Feb 1, 2016

Closing in favor of #18581.

@ikehz ikehz closed this as completed Feb 1, 2016
@ikehz
Copy link
Contributor Author

ikehz commented Feb 1, 2016

Oops, didn't mean to do that.

@ikehz ikehz reopened this Feb 1, 2016
@ikehz
Copy link
Contributor Author

ikehz commented Feb 2, 2016

Okay, in the API logs I see:

apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "extensions/v1beta1"

But I'd expect that in the error that gets propagated back to the client, but instead I just get

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:459
Expected
    <*errors.StatusError | 0xc820732480>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "an error on the server has prevented the request from succeeding (delete jobs.extensions run-test)",
            Reason: "InternalError",
            Details: {
                Name: "run-test",
                Group: "extensions",
                Kind: "jobs",
                Causes: [
                    {
                        Type: "UnexpectedServerResponse",
                        Message: "unknown",
                        Field: "",
                    },
                ],
                RetryAfterSeconds: 0,
            },
            Code: 500,
        },
    }
to be nil

@lavalamp this lack of error message propagation seems like a separate regression; filing an issue for you to triage.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 2, 2016

Okay, so in v1.1, we had no DeleteOptions in v1beta1, but now we've introduced such an object, and this kubectl test relies on it. Based on that, I'm thinking this is a real regression/version incompatibility, but this is a beta object, so we probably don't care? My proposal then is to skip this test for pre-v1.2, similar to #19608.

Reassigning to @caesarxuchao for verification: it looks like you were the one to implement the DeleteOptions API object globally.

cc @kubernetes/kube-api

@bgrant0607
Copy link
Member

This looks more like a bug in v1.1 than a compatibility issue. Arguably, the v1beta1 APIs shouldn't have barfed.

Related: #19501, #19534, #5889

cc @smarterclayton

@smarterclayton
Copy link
Contributor

Yeah, I would have expected the v1beta1 api in 1.1 to ignore it (don't expect a body and don't try to decode it) or have DeleteOptions. I did not expect 1.1 to not accept DeleteOptions on any API endpoint.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 2, 2016

Great. Reassigning to @lavalamp for triage.

@ikehz ikehz assigned lavalamp and unassigned ikehz Feb 2, 2016
@ikehz ikehz added this to the v1.2 milestone Feb 2, 2016
@lavalamp
Copy link
Member

lavalamp commented Feb 3, 2016

This sounds like 1.1 types is configured incorrectly, that should be registered. @nikhiljindal what do you think?

@smarterclayton
Copy link
Contributor

I don't think we sorted out the unversioned stuff until the 1.2 branch
started, so it might be expecting "v1" "DeleteOptions" under the extensions
API group.

On Tue, Feb 2, 2016 at 8:35 PM, Daniel Smith [email protected]
wrote:

This sounds like 1.1 types is configured incorrectly, that should be
registered. @nikhiljindal https://github.com/nikhiljindal what do you
think?


Reply to this email directly or view it on GitHub
#19715 (comment)
.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 8, 2016

Ping on this; @lavalamp or @nikhiljindal what can I do to get this moving? This is likely a blocker for v1.2.

@lavalamp lavalamp assigned ikehz and unassigned lavalamp Feb 9, 2016
@lavalamp
Copy link
Member

lavalamp commented Feb 9, 2016

The failing test line is Expect(c.Extensions().Jobs(ns).Delete("run-test-2", api.NewDeleteOptions(0))).To(BeNil()), right? This is test code that is wrong, not kubectl code.

The dumb effective thing to do is just try again without the options if you get that error. I have no bandwidth to fix this personally, maybe you do?

@ikehz
Copy link
Contributor Author

ikehz commented Feb 9, 2016

This seems to have been resolved by #20459, which I thought was just an error propagation problem, but seems to have fixed the test completely.

@ikehz ikehz closed this as completed Feb 9, 2016
@ikehz ikehz reopened this Feb 17, 2016
@ikehz
Copy link
Contributor Author

ikehz commented Feb 17, 2016

sigh Seems to have reappeared.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 17, 2016

Or maybe I was just asleep the last time I went through these issues.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 20, 2016

@lavalamp That is the line that's failing, but I do think this is a problem with kubectl (or CSI) code:

$ ./cluster/kubectl.sh version
Client Version: version.Info{Major:"1", Minor:"2+", GitVersion:"v1.2.0-alpha.7.1144+6fef6bc97778f1-dirty", GitCommit:"6fef6bc97778f1af4ffd359af6b98cbdd43cda1d", GitTreeState:"dirty"}
Server Version: version.Info{Major:"1", Minor:"1+", GitVersion:"v1.1.8-beta.0.36+7518f0e0795fd4", GitCommit:"7518f0e0795fd40a2d0c4656676c850c186df7af", GitTreeState:"clean"}
$
$ ./cluster/kubectl.sh get jobs
JOB        SUCCESSFUL
run-test   0
$
$ ./cluster/kubectl.sh delete job run-test --grace-period=0
Error from server: no kind "DeleteOptions" is registered for version "extensions/v1beta1"

This seems like a legitimate API problem, not something in the test code that's wrong.

@smarterclayton
Copy link
Contributor

smarterclayton commented Feb 20, 2016 via email

@ikehz ikehz added team/ux and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Feb 22, 2016
@ikehz
Copy link
Contributor Author

ikehz commented Feb 22, 2016

This is something we need to add to the submission wrapper for older servers - strip version info from DeleteOptions on the way out.

We have users running production clusters in v1.1.7. If we need to patch this is v1.1, we'll have to release & roll out a v1.1.8 before distributing a v1.2 client.

Is there any way we can patch this in v1.2 rather than back-patching to v1.1 (and v1.0)?

@lavalamp
Copy link
Member

@caesarxuchao Do you have cycles for another round of "actually, don't encode the version/kind"?

@lavalamp lavalamp assigned caesarxuchao and unassigned ikehz Feb 22, 2016
@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 22, 2016
@caesarxuchao
Copy link
Member

If we want to patch this in 1.2, we can strip the APIVersion if it's "extension/v1beta1" and the kind is DeleteOptions, that should fix the bug. We'll need to keep it that way until we retire release-1.1. Maybe this solution has other disadvantages I haven't foreseen yet.

Or, if we can patch 1.1, we just need to register v1.DeleteOptions in pkg/apis/extensions/v1beta1/register.go. This is more safe IMO, but it seems @ihmccreery prefers not touching 1.1.

What do you guys think?

@smarterclayton
Copy link
Contributor

smarterclayton commented Feb 23, 2016 via email

@ikehz
Copy link
Contributor Author

ikehz commented Feb 23, 2016

I will note that extensions are v1beta1 for a reason...

If the correct answer here is "That's fine, we expect this test to break and we don't care," that's a reasonable answer, and we can fix the v1.1 test accordingly and call it a day. But if we consider this a bug that we want users to not experience, then we should fix it in v1.2, since we have to assume that there will be older clients floating around. cc @kubernetes/kube-api for executive decision?

@caesarxuchao
Copy link
Member

I would say it's a bug for v1.1 master to barf at DeleteOptions, we should patch v1.1. And if our users haven't applied the patch on their v1.1 master, I would expect the impact to be minimum, because DeleteOptions have no effect on extensions/v1beta1 objects anyway, users have to reason to send DeleteOptions to v1.1 extensions endpoints.

@lavalamp
Copy link
Member

@caesarxuchao Yes, it's a bug, but we can't fix version incompatibilities by patching the widely-distributed thing. :( We have to fix them by patching the new thing.

However... @ihmccreery, if you omit the "--grace-period=0" in the command $ ./cluster/kubectl.sh delete job run-test --grace-period=0, does it work? As long as plain kubectl delete works, I'm not sure this is a big deal, since it won't break existing kubectl commands.

In that case, the fix will be for the test to not specify the grace period if running against 1.1

@lavalamp
Copy link
Member

@smarterclayton Yes, eventual removal via slow attrition. I don't think it'll be gone by 1.3 though.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 23, 2016

if you omit the "--grace-period=0" in the command $ ./cluster/kubectl.sh delete job run-test --grace-period=0, does it work?

Yes, and removing the grace period logic from the test also fixes the test.

As long as plain kubectl delete works, I'm not sure this is a big deal, since it won't break existing kubectl commands.

Maybe, but ./cluster/kubectl.sh delete job run-test --grace-period=0 on a v1.1 client does in fact work, (I don't know if the grace period is respected, but it doesn't barf, at least).

@lavalamp
Copy link
Member

Oh. Well, in that case, I think we have to fix it.

On Tue, Feb 23, 2016 at 11:25 AM, Isaac Hollander McCreery <
[email protected]> wrote:

if you omit the "--grace-period=0" in the command $ ./cluster/kubectl.sh
delete job run-test --grace-period=0, does it work?

Yes, and removing the grace period logic from the test also fixes the test.

As long as plain kubectl delete works, I'm not sure this is a big deal,
since it won't break existing kubectl commands.

Maybe, but ./cluster/kubectl.sh delete job run-test --grace-period=0 on a
v1.1 client does in fact work, (I don't know if the grace period is
respected, but it doesn't barf, at least).


Reply to this email directly or view it on GitHub
#19715 (comment)
.

@caesarxuchao
Copy link
Member

Maybe, but ./cluster/kubectl.sh delete job run-test --grace-period=0 on a v1.1 client does in fact work, (I don't know if the grace period is respected, but it doesn't barf, at least).

that's because in 1.1 the client encode the DeleteOptions to v1: https://github.com/kubernetes/kubernetes/blob/release-1.1/pkg/client/unversioned/jobs.go#L92. I think this behavior is wrong, the correct way is registering DeleteOptions in extensions/v1beta1.

Btw, in 1.1 , there is discrepency in clients, e.g., in Deployment client, DeleteOptions is encoded to extensions/v1beta1: https://github.com/kubernetes/kubernetes/blob/release-1.1/pkg/client/unversioned/deployment.go#L75

@lavalamp
Copy link
Member

@ihmccreery did you copy-paste the wrong test name as the title of this issue? Is this about pods or jobs?

@lavalamp
Copy link
Member

Looks like this is about jobs. Do jobs actually support graceful deletion? If not, this is just cargo-cult code and we should remove all the code that sets it for jobs.

Options:

  • Make the job client stop sending delete options (if jobs don't respect them anyway)
  • Make the job reaper not send delete options (if jobs don't respect them anyway)
  • Otherwise, add a special codec to the job client that emits deletion options in v1.

@erictune
Copy link
Member

No code in our respository deletes jobs except namespace cleanup (pkg/controller//namespace/namespace_controller_utils.go), which uses DeleteCollection, which has no DeleteOptions.

@caesarxuchao
Copy link
Member

Only Pod has a DeleteStrategy, so as we expected, Pod is the only object that respect DeleteOptions.

So I'll just remove the use of non-nil DeleteOptions that causes this bug.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 23, 2016

@ihmccreery did you copy-paste the wrong test name as the title of this issue? Is this about pods or jobs?

It's about kubectl run, which used to create bare pods, and as of #17195, creates jobs.

https://github.com/kubernetes/kubernetes/pull/17195/files#diff-540d8a63e8b72b4e05dda54ae37f77d2R371 did just basically find-and-replaced pod for job in the test; but perhaps that was naive.

cc @janetkuo

@ikehz
Copy link
Contributor Author

ikehz commented Feb 23, 2016

I guess the name "Simple pod" also doesn't really reflect reality as of #17195, as well.

@ikehz
Copy link
Contributor Author

ikehz commented Feb 23, 2016

No code in our respository deletes jobs except namespace cleanup

Any reason we can't just nix the whole

Expect(c.Extensions().Jobs(ns).Delete("run-test", api.NewDeleteOptions(0))).To(BeNil())

line?

@lavalamp
Copy link
Member

You want to delete the job. Replacing the NewDeleteOptions with nil seems
to be the solution we're converging on.

On Tue, Feb 23, 2016 at 1:32 PM, Isaac Hollander McCreery <
[email protected]> wrote:

No code in our respository deletes jobs except namespace cleanup

Any reason we can't just nix the whole

Expect(c.Extensions().Jobs(ns).Delete("run-test", api.NewDeleteOptions(0))).To(BeNil())

line?


Reply to this email directly or view it on GitHub
#19715 (comment)
.

@caesarxuchao
Copy link
Member

Yes. Putting together a PR now.

@ikehz
Copy link
Contributor Author

ikehz commented Mar 10, 2016

cc @jlowdermilk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test area/upgrade priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

6 participants