Skip to content

protokube container started receiving Docker kill events with signal 23 after Flatcar upgrade to 2605.9.0 #10388

Closed
@marcinwfilewave

Description

@marcinwfilewave

1. What kops version are you running? The command kops version, will display
this information.

1.18.1

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9", GitCommit:"94f372e501c973a7fa9eb40ec9ebd2fe7ca69848", GitTreeState:"clean", BuildDate:"2020-09-16T13:47:43Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Run the simplest cluster (1 master, 1 node) with the Flatcar 2605.9.0 image.

Observed behaviour started happening on three separate clusters minutes after Flatcar updated itself to 2605.9.0.

5. What happened after the commands executed?
Docker events for protokube container on master started showing kill events like this one. It happens on non regular schedule, at least once every few minutes.

core@ip-xx-xx-xx-xx ~ $ docker events -f container=6764846bed0d
2020-12-08T10:05:06.354703039Z container kill 6764846bed0dad9b2e8f42c5a3670e8d3d99b33f1f1e64a847b91ed379e7386f (image=protokube:1.18.1, name=protokube, signal=23)
2020-12-08T10:09:07.556562016Z container kill 6764846bed0dad9b2e8f42c5a3670e8d3d99b33f1f1e64a847b91ed379e7386f (image=protokube:1.18.1, name=protokube, signal=23)
2020-12-08T10:09:07.589605602Z container kill 6764846bed0dad9b2e8f42c5a3670e8d3d99b33f1f1e64a847b91ed379e7386f (image=protokube:1.18.1, name=protokube, signal=23)
2020-12-08T10:11:08.159373195Z container kill 6764846bed0dad9b2e8f42c5a3670e8d3d99b33f1f1e64a847b91ed379e7386f (image=protokube:1.18.1, name=protokube, signal=23)

Despite kill signal being sent to this container it does not stop nor restart. protokube container seem not to react to this event.

6. What did you expect to happen?
Event is of type "kill" so I also assumed that protokube will get killed (and restarted). On the other hand signal is 23 (so probably SIGURG) and I'm not sure if such signal would cause a process to exit.

I'm not sure about the expected outcome here. The point is that is was not happening before Flatcar upgrade to 2605.9.0 and I don't know whether current behaviour is a bug or expected behaviour.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: REDACTED
spec:
  api:
    loadBalancer:
      additionalSecurityGroups:
      - REDACTED
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: REDACTED
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    memoryRequest: 100Mi
    name: events
  fileAssets:
  - content: |
      ---
      REDACTED
    name: audit-policy
    path: /srv/kubernetes/audit.yaml
    roles:
    - Master
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    admissionControl:
    - OwnerReferencesPermissionEnforcement
    - AlwaysPullImages
    - PodSecurityPolicy
    - NodeRestriction
    auditLogMaxAge: 10
    auditLogMaxBackups: 20
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/audit.yaml
    authorizationMode: Node,RBAC
    disableBasicAuth: true
    oidcClientID: REDACTED
    oidcGroupsClaim: REDACTED
    oidcIssuerURL: REDACTED
    oidcUsernameClaim: REDACTED
    tlsCipherSuites:
    - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
    - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
    - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    - TLS_RSA_WITH_AES_256_GCM_SHA384
    - TLS_RSA_WITH_AES_128_GCM_SHA256
    tlsMinVersion: VersionTLS12
  kubeControllerManager:
    featureGates:
      RotateKubeletServerCertificate: "true"
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    featureGates:
      RotateKubeletServerCertificate: "true"
    readOnlyPort: 0
    tlsCipherSuites:
    - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
    - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
    - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    - TLS_RSA_WITH_AES_256_GCM_SHA384
    - TLS_RSA_WITH_AES_128_GCM_SHA256
    tlsMinVersion: VersionTLS12
  kubernetesApiAccess:
  - REDACTED
  kubernetesVersion: 1.18.9
  masterInternalName: REDACTED
  masterPublicName: REDACTED
  networkCIDR: 10.40.0.0/16
  networking:
    cilium:
      IPTablesRulesNoinstall: false
      agentPrometheusPort: 9090
      autoDirectNodeRoutes: false
      bpfCTGlobalAnyMax: 0
      bpfCTGlobalTCPMax: 0
      clusterName: ""
      cniBinPath: ""
      enableNodePort: false
      enablePrometheusMetrics: true
      enableRemoteNodeIdentity: false
      enableipv4: false
      enableipv6: false
      monitorAggregation: ""
      nodeInitBootstrapFile: ""
      preallocateBPFMaps: false
      reconfigureKubelet: false
      removeCbrBridge: false
      restartPods: false
      sidecarIstioProxyImage: ""
      toFqdnsEnablePoller: false
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - REDACTED
  subnets:
  - cidr: 10.40.32.0/19
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 10.40.0.0/22
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  topology:
    bastion:
      bastionPublicName: REDACTED
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-07T10:00:50Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: bastions
spec:
  image: ami-0ef157ace1e313660
  machineType: t2.micro
  maxSize: 0
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
    kops.k8s.io/instancetype: on-demand
  role: Bastion
  subnets:
  - utility-eu-west-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-07T10:00:51Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: master-eu-west-1b
spec:
  image: ami-0ef157ace1e313660
  machineType: m5.large
  maxPrice: "0.08"
  maxSize: 1
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - m5.large
    - m5a.large
    - m5n.large
    - m4.large
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1b
    kops.k8s.io/instancetype: spot
  role: Master
  subnets:
  - eu-west-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-07T10:00:51Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: nodes
spec:
  image: ami-0ef157ace1e313660
  machineType: m5.2xlarge
  maxPrice: "0.3000"
  maxSize: 1
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - m5.2xlarge
    - m5a.2xlarge
    - m5n.2xlarge
    - m4.2xlarge
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
    kops.k8s.io/instancetype: spot
  role: Node
  subnets:
  - eu-west-1b

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

Nothing to add.

9. Anything else do we need to know?
Nothing to add.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions