[BUG] Failed to renew TLS certificates #3294

rwmjones · 2022-08-08T13:05:54Z

General information

OS: Linux, Fedora 36
Hypervisor: crc is running on baremetal, KVM is available
Did you run crc setup before starting it YES
Running CRC on: Baremetal-Server

CRC version

CRC version: 2.6.0+d606e64
OpenShift version: 4.10.22
Podman version: 4.1.0

CRC status

DEBU CRC version: 2.6.0+d606e64                   
DEBU OpenShift version: 4.10.22                   
DEBU Podman version: 4.1.0                        
DEBU Running 'crc status'                         
DEBU Checking file: /home/rjones/.crc/machines/crc/.crc-exist 
DEBU Checking file: /home/rjones/.crc/machines/crc/.crc-exist 
DEBU Found binary path at /home/rjones/.crc/bin/crc-driver-libvirt 
DEBU Launching plugin server for driver libvirt   
DEBU Plugin server listening at address 127.0.0.1:42591 
DEBU () Calling .GetVersion                       
DEBU Using API Version 1                          
DEBU () Calling .SetConfigRaw                     
DEBU () Calling .GetMachineName                   
DEBU (crc) Calling .GetBundleName                 
DEBU (crc) Calling .GetState                      
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Fetching VM..." 
DEBU (crc) Calling .GetIP                         
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="GetIP called for crc" 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="IP address: 192.168.130.11" 
DEBU (crc) Calling .GetIP                         
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="GetIP called for crc" 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="IP address: 192.168.130.11" 
DEBU Running SSH command: df -B1 --output=size,used,target /sysroot | tail -1 
DEBU Using ssh private keys: [/home/rjones/.crc/machines/crc/id_ecdsa /home/rjones/.crc/cache/crc_libvirt_4.10.22_amd64/id_ecdsa_crc] 
DEBU SSH command results: err: <nil>, output: 32737570816 12522758144 /sysroot 
DEBU cannot get OpenShift status: stat /home/rjones/.crc/machines/crc/kubeconfig: no such file or directory 
DEBU Making call to close driver server           
DEBU (crc) Calling .Close                         
DEBU Successfully made call to close driver server 
DEBU Making call to close connection to plugin binary 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Closing plugin on server side" 
CRC VM:          Running
OpenShift:       Unreachable (v4.10.22)
Podman:          
Disk Usage:      12.52GB of 32.74GB (Inside the CRC VM)
Cache Usage:     17.09GB
Cache Directory: /home/rjones/.crc/cache

CRC config

- consent-telemetry                     : no

Host Operating System

# Put the output of `cat /etc/os-release` in case of Linux
NAME="Fedora Linux"
VERSION="36 (Server Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server

Steps to reproduce

crc delete -f
crc cleanup
crc setup
crc start --log-level debug

Expected

CRC should work, I guess?

Actual

Failed to renew TLS certificates: please check if a newer CRC release is available: Failed to get all certificate signing requests: ssh command error:
command : timeout 5s oc get csr -ojson --context admin --cluster crc --kubeconfig /opt/kubeconfig

This error happens every time I try to use crc.

Logs

https://gist.github.com/rwmjones/3a58df5c478e11e003455243cce0d8f9

The text was updated successfully, but these errors were encountered:

rwmjones · 2022-08-08T13:07:13Z

As a general question, is there some particular Linux platform where crc might work better? It seems really broken on Fedora 36. Maybe it's better to use CentOS or RHEL?

praveenkumar · 2022-08-08T15:44:10Z

@rwmjones It should work with F-36, we (most of crc dev) are using f-36 day to day. Looks like the disk image (bundle) we ship with release have cert expired and regeneration of it is now failing. During our testing we didn't see this issue but we will again check and get back to you.

jsliacan · 2022-08-08T20:49:13Z

On F36 I did 3 starts in a row.

one start with failed renewal: https://gist.github.com/jsliacan/67d4ba4fc5f6308f84e14a8b9f46e0f2
another start with successful renewal: https://gist.github.com/jsliacan/f35da000d7f62f82366ed3f5da837d99
another start also with success: https://gist.github.com/jsliacan/dd74b33f2f941e89dbcdcc17da665b9f

I also ran e2e test once, and it passed.

rwmjones · 2022-08-08T21:24:54Z

Maybe a hardware speed thing? I've done about half a dozen restarts and they all failed in the same way.

gbraad · 2022-08-09T10:13:54Z

Can you share your hardware configuration?

rwmjones · 2022-08-09T10:20:41Z

Sure. It's an Intel NUC RNUC11PAHi50000 which is a fairly standard 4 core / 8 thread 11th Gen Intel Core i5 mobile chipset. This has Fedora 36 installed on it, and all crc commands are run directly on the baremetal. (The crc-created VM therefore runs as an L1 guest) Edit: 16GB RAM and 1TB hard disk.

pkesseli · 2022-08-09T10:47:16Z

For what it's worth, since I ran into this issue with release 2.6.0, I tried the previous release 2.5.1, and ran into the same issue there:

INFO Starting OpenShift kubelet service
DEBU Using root access: Executing systemctl daemon-reload command
DEBU Running SSH command: sudo systemctl daemon-reload
DEBU SSH command results: err: <nil>, output:
DEBU Using root access: Executing systemctl start kubelet
DEBU Running SSH command: sudo systemctl start kubelet
DEBU SSH command results: err: <nil>, output:
INFO Kubelet serving certificate has expired, waiting for automatic renewal... [will take up to 8 minutes]
DEBU retry loop: attempt 0
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU retry loop: attempt 1
...
DEBU retry loop: attempt 55
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU retry loop: attempt 56
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU RetryAfter timeout after 57 tries
DEBU Bundle has been generated 48 days ago
DEBU Making call to close driver server
DEBU (crc) Calling .Close
DEBU (crc) DBG | time="2022-08-09T12:41:17+02:00" level=debug msg="Closing plugin on server side"
DEBU Successfully made call to close driver server
DEBU Making call to close connection to plugin binary
Failed to renew TLS certificates: please check if a newer CRC release is available: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired (x57)

praveenkumar · 2022-08-09T10:49:56Z

@rwmjones Do you have another hardware where you can run crc (either remote) to see if it that really something around hardware because during my testing (on GCP on a nested virt enabled VM) it took more time then usual but cluster did come up.

pkesseli · 2022-08-09T10:50:50Z

@rwmjones @gbraad Can we run the certificate renewal manually with custom timeouts to work around this issue?

gbraad · 2022-08-09T11:09:16Z

It is our biggest issue as this is not possible OOTB.
Praveen reserves time to look into this for this sprint.

gbraad · 2022-08-09T11:10:29Z

NUC and the other system which also filed an issue was a SFF (Small form factor) so this might be related to cpu throttling due to thermals, however Praveen also saw this on GCP now. Looks like a timing/time out issue. But we didn't see this before. We have automated tests in place to confirm this works, but non in resource constrained environments?

rwmjones · 2022-08-09T12:20:43Z

I dont really have other hardware to run this on. Re: Can we run the certificate renewal manually with custom timeouts to work around this issue? - how to do that?

gbraad · 2022-08-09T14:34:29Z

how to do that?

There is no mechanism AFAIK to trigger this or define other timeouts.
You could try to give a little bit more memory instead?

ryanj · 2022-08-09T17:23:44Z

I'm hitting the same error with crc start using Fedora 36 on a ThinkPad with 64GB of RAM

praveenkumar · 2022-08-10T07:36:53Z

I would suggest to wait till next week so we have new version with updated bundle certificate.

praveenkumar · 2022-08-18T04:21:57Z

@rwmjones @ryanj We just released the new version of CRC, can you please try that and let us know if you hit another issues?

rwmjones · 2022-08-18T12:02:15Z

I'm trying crc-linux-2.7.1-amd64 now.

rwmjones · 2022-08-18T12:16:47Z

BTW it always says:

WARN Cannot add pull secret to keyring: The name is not activatable

However I'm not sure if this causes a problem.

rwmjones · 2022-08-18T12:47:04Z

I still get 401 Unauthorized errors logging in as kubeadmin, either through the web interface or with oc login. That's what I was getting before too. crc status says:

CRC VM:          Running
OpenShift:       Unreachable (v4.11.0)
Podman:          
Disk Usage:      14.74GB of 32.74GB (Inside the CRC VM)
Cache Usage:     34.18GB
Cache Directory: /home/rjones/.crc/cache

Some parts (eg. the web interface) are running if I log in as developer.
So ... unclear.

praveenkumar · 2022-08-18T15:35:22Z

@rwmjones which means stat /home/rjones/.crc/machines/crc/kubeconfig: no such file or directory which you get from the crc status --log-level debug can you share the crc start --log-level debug to a gist?

rwmjones · 2022-08-18T16:17:49Z

$ ls -l /home/rjones/.crc/machines/crc/kubeconfig
ls: cannot access '/home/rjones/.crc/machines/crc/kubeconfig': No such file or directory

Did you want me to just run the crc start command, or to delete and rebuild the whole VM? Anyway the output with just crc start is: https://paste.centos.org/view/287c11e2

ryanj · 2022-08-18T17:24:41Z

Here is a copy of my crc start --log-level debug output from crc-2-7-1: https://gist.github.com/ryanj/f513967816196d36cf94b4164c2f1748

^ This is on a 64GB laptop with 32000 memory and 8 cpus allocated in the crc config

It seems to work correctly, but reports a failure and complains about not reaching it's intended startup state within 10 mins...

praveenkumar · 2022-08-19T03:40:14Z

Here is a copy of my crc start --log-level debug output from crc-2-7-1: https://gist.github.com/ryanj/f513967816196d36cf94b4164c2f1748

^ This is on a 64GB laptop with 32000 memory and 8 cpus allocated in the crc config

It seems to work correctly, but reports a failure and complains about not reaching it's intended startup state within 10 mins...

@ryanj Sometime some of operator doesn't able to reconcile till 10 mins but for you cluster is healthy and running state.

The issue I see with @rwmjones is the hardware which is NUC and might take longer than usual but I still don't understand why the kubeconfig file is not updated in the /home/rjones/.crc/machines/crc/kubeconfig.

praveenkumar · 2022-08-19T03:41:38Z

$ ls -l /home/rjones/.crc/machines/crc/kubeconfig
ls: cannot access '/home/rjones/.crc/machines/crc/kubeconfig': No such file or directory
Did you want me to just run the crc start command, or to delete and rebuild the whole VM? Anyway the output with just crc start is: https://paste.centos.org/view/287c11e2

@rwmjones Can you please execute following and let us know the output of debug level?

$ crc delete
$ crc cleanup
$ crc setup --log-level debug
$ crc start --log-level debug

rwmjones · 2022-08-19T07:38:50Z

https://gist.github.com/rwmjones/052da1b47acf68c92f03d4c42a3f6817

praveenkumar · 2022-08-19T09:32:34Z

@rwmjones Thanks, so as per logs looks like apiserver is not even able to available during allocated time and that cause kubeconfig file not even present in respective directory, this is really looks like hardware related, even we add more wait time the overall performance of this cluster might not suitable for workloads :(

level=debug msg="retry loop: attempt 0"
level=debug msg="Running SSH command: timeout 5s oc get nodes --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: Process exited with status 124, output: "

rwmjones · 2022-08-19T09:43:30Z

Can we make the timeouts longer or configurable? The machine has 16G of RAM and is not swapping.

praveenkumar · 2022-08-19T10:39:04Z

@rwmjones I just created a custom linux binary with increased time can you try that please delete the cluster before starting using this binary.

$ curl -L -O https://github.com/praveenkumar/crc/releases/download/1.21.0/crc 
$ chmod +x crc
$ ./crc delete
$ ./crc start --log-level debug

This binary uses following patch (remove fast failure and increase the overall retry time)

diff --git a/pkg/crc/cluster/cluster.go b/pkg/crc/cluster/cluster.go
index 0f5009c1..f8bc9674 100644
--- a/pkg/crc/cluster/cluster.go
+++ b/pkg/crc/cluster/cluster.go
@@ -413,7 +413,7 @@ func WaitForRequestHeaderClientCaFile(ctx context.Context, sshRunner *ssh.Runner
 func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
        logging.Info("Waiting for kube-apiserver availability... [takes around 2min]")
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "nodes")
+               stdout, stderr, err := ocConfig.RunOcCommand("get", "nodes")
                if err != nil {
                        logging.Debug(stderr)
                        return &errors.RetriableError{Err: err}
@@ -421,7 +421,7 @@ func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
                logging.Debug(stdout)
                return nil
        }
-       return errors.Retry(ctx, 4*time.Minute, waitForAPIServer, time.Second)
+       return errors.Retry(ctx, 10*time.Minute, waitForAPIServer, time.Second)
 }
 
 func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error {
@@ -431,7 +431,7 @@ func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error
 
        deleteOpenshiftAPIServerPods := func() error {
                cmdArgs := []string{"delete", "pod", "--all", "--force", "-n", "openshift-apiserver"}
-               _, stderr, err := ocConfig.WithFailFast().RunOcCommand(cmdArgs...)
+               _, stderr, err := ocConfig.RunOcCommand(cmdArgs...)
                if err != nil {
                        return &errors.RetriableError{Err: fmt.Errorf("Failed to delete pod from openshift-apiserver namespace %v: %s", err, stderr)}
                }

rwmjones · 2022-08-19T11:27:03Z

I tried it twice and it failed both times. gist from the second attempt: https://gist.github.com/rwmjones/084d4abd35e76a4c8b7eab7b7c42b53d

I don't think the change to the timeout had any effect since it appeared to only wait 4 mins.

Interestingly I tried going inside the VM while it was starting. The VM is only using half available RAM (8GB). I think it could easily be larger. It also only has half available cores (4). However it's not swapping, although it is doing a very large amount of I/O and kube-apiserver is using lots of CPU (along with various other processes, crio, kubelet, etcd, systemd, python3, ...)

I think you could give the VM something like total host system RAM - 4 GB, and total host pCPUs - 2, or something like that.

rwmjones · 2022-08-19T11:28:29Z

loadavg inside the VM, several minutes after crc start gave up:
top - 11:28:22 up 7 min, 1 user, load average: 7.12, 5.46, 2.53

praveenkumar · 2022-08-19T12:33:31Z

@rwmjones it does work from node side but then failed for getting configmaps which is again using waitfailfast need to remove that and updated the binary again, please remove the old one and refetch it again.

level=debug msg="Running SSH command: timeout 30s oc get nodes --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: <nil>, output: NAME                 STATUS   ROLES           AGE   VERSION\ncrc-9jm8r-master-0   Ready    master,worker   11d   v1.24.0+9546431\n"
level=debug msg="NAME                 STATUS   ROLES           AGE   VERSION\ncrc-9jm8r-master-0   Ready    master,worker   11d   v1.24.0+9546431\n"
level=debug msg="Waiting for availability of resource type 'cm'"

Interestingly I tried going inside the VM while it was starting. The VM is only using half available RAM (8GB). I think it could easily be larger. It also only has half available cores (4). However it's not swapping, although it is doing a very large amount of I/O and kube-apiserver is using lots of CPU (along with various other processes, crio, kubelet, etcd, systemd, python3, ...)

yes it is also going to use the memory once all the operator is up and running. Also you can use following to provide more ram and cpu to system but make sure you delete existing instance first

$ crc start -m 12000 -c 6 --log-level debug

This time patch is

diff --git a/pkg/crc/cluster/cluster.go b/pkg/crc/cluster/cluster.go
index 0f5009c1..f8bc9674 100644
--- a/pkg/crc/cluster/cluster.go
+++ b/pkg/crc/cluster/cluster.go
@@ -413,7 +413,7 @@ func WaitForRequestHeaderClientCaFile(ctx context.Context, sshRunner *ssh.Runner
 func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
        logging.Info("Waiting for kube-apiserver availability... [takes around 2min]")
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "nodes")
+               stdout, stderr, err := ocConfig.RunOcCommand("get", "nodes")
                if err != nil {
                        logging.Debug(stderr)
                        return &errors.RetriableError{Err: err}
@@ -421,7 +421,7 @@ func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
                logging.Debug(stdout)
                return nil
        }
-       return errors.Retry(ctx, 4*time.Minute, waitForAPIServer, time.Second)
+       return errors.Retry(ctx, 10*time.Minute, waitForAPIServer, time.Second)
 }
 
 func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error {
@@ -431,7 +431,7 @@ func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error
 
        deleteOpenshiftAPIServerPods := func() error {
                cmdArgs := []string{"delete", "pod", "--all", "--force", "-n", "openshift-apiserver"}
-               _, stderr, err := ocConfig.WithFailFast().RunOcCommand(cmdArgs...)
+               _, stderr, err := ocConfig.RunOcCommand(cmdArgs...)
                if err != nil {
                        return &errors.RetriableError{Err: fmt.Errorf("Failed to delete pod from openshift-apiserver namespace %v: %s", err, stderr)}
                }
diff --git a/pkg/crc/cluster/csr.go b/pkg/crc/cluster/csr.go
index 9ed5e78a..181ef781 100644
--- a/pkg/crc/cluster/csr.go
+++ b/pkg/crc/cluster/csr.go
@@ -16,7 +16,7 @@ import (
 func WaitForOpenshiftResource(ctx context.Context, ocConfig oc.Config, resource string) error {
        logging.Debugf("Waiting for availability of resource type '%s'", resource)
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", resource)
+               stdout, stderr, err := ocConfig.RunOcCommand("get", resource)
                if err != nil {
                        logging.Debug(stderr)
                        return &crcerrors.RetriableError{Err: err}
@@ -47,7 +47,7 @@ func getCSRList(ctx context.Context, ocConfig oc.Config, expectedSignerName stri
        if err := WaitForOpenshiftResource(ctx, ocConfig, "csr"); err != nil {
                return nil, err
        }
-       output, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "csr", "-ojson")
+       output, stderr, err := ocConfig.RunOcCommand("get", "csr", "-ojson")
        if err != nil {
                return nil, fmt.Errorf("Failed to get all certificate signing requests: %v %s", err, stderr)
        }

rwmjones · 2022-08-19T14:44:14Z

https://gist.github.com/rwmjones/2c220e873879debbeb65fe4b864678d4

rwmjones · 2022-08-19T14:44:42Z

I can confirm that this time the VM was created with 12GB RAM and 6 cores.

praveenkumar · 2022-08-19T15:36:27Z

level=debug msg="Running SSH command: timeout 30s oc get cm --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: <nil>, output: NAME                       DATA   AGE\nkube-root-ca.crt           1      11d\nopenshift-service-ca.crt   1      11d\n"
level=debug msg="NAME                       DATA   AGE\nkube-root-ca.crt           1      11d\nopenshift-service-ca.crt   1      11d\n"
level=debug msg="Running SSH command: timeout 30s oc delete -n openshift-machine-config-operator cm machine-config-controller --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: Process exited with status 124, output: "

@rwmjones It went one step forward but then again due to slow IO/proess it fail again, Now I am out of idea around how it can work :(

rwmjones · 2022-08-19T16:08:41Z

I have ordered more RAM.

rwmjones · 2022-08-20T10:45:08Z

I have upgraded the machine to 64 GB of RAM, the maximum possible for this NUC hardware. Surprisingly the default VM created is still 8 GB / 4 cores, I would have expected it to depend on the available host memory and cores in some way. My initial attempt to start crc failed as before, probably because of this.

So I used:

crc start -m 50000 -c 6 --log-level debug

It basically fails in the same way as far as I can tell: https://gist.github.com/rwmjones/0c48232408e7396b43a4cdbc64ded877

gbraad · 2022-08-23T05:34:06Z

@praveenkumar this might be related to the auth not becoming available in time?

praveenkumar · 2022-08-23T05:43:22Z

@praveenkumar this might be related to the auth not becoming available in time?

@gbraad No, as per logs it is failing long before that.

robertxgray · 2022-08-23T07:28:27Z

Hi. For the last two weeks I've been trying to use crc while suffering the same problems as @rwmjones. Renewal of certificates included.

During the last tests, I found with horror that my whole home folder was being shared with the crc VM. Because of this, I have created a new user dedicated to run crc, and incidentally this seems to have solved the problems.

Also, the new user's home is located in a smaller yet faster drive. Not sure if that's related.

I'm using quite modest hardware: i5-4670 CPU @ 3.40GHz (4 core / 4 thread), 16 GB RAM, Fedora 35.

praveenkumar · 2022-08-23T08:44:44Z

@robertxgray with latest version of CRC you shouldn't see certificate renewal and file share support is added to crc 2.7.1. Are you seeing this issue with 2.7.1 or with older version?

robertxgray · 2022-08-23T09:57:14Z

@praveenkumar Sorry for the misunderstanding. I mean I've had the same errors as rwmjones since the beginning of this thread, but certificate renewal issues were gone after updating to 2.7.1 as expected.

praveenkumar · 2022-08-23T10:15:43Z

@robertxgray But to make 2.7.1 works you had to created a separate user because of home folder is being shared with CRC VM? I want to figure out if this need a different bug and we missed some corner case.

robertxgray · 2022-08-23T12:15:00Z

@praveenkumar I created another user because I didn't want CRC to mess with all the junk stored in my main user's home folder. CRC being able to start with the new user was a nice and unexpected side effect. Before that, I has having the same errors shown in rwmjones' latest logs.

I have performed some additional tests moving the new user's home folder to the slow hard drive and CRC still works. Sometimes I get: ERRO Cluster is not ready: cluster operators are still not stable after...
But crc status shows OpenShift as Starting and it changes to Running a few minutes later.

praveenkumar · 2022-08-23T12:17:34Z

@robertxgray Thank you for confirming.

stale · 2022-10-29T06:04:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

praveenkumar · 2023-09-06T11:56:21Z

Thanks for the issue, if it still exist please create new one with latest version of crc.

/close

openshift-ci · 2023-09-06T11:56:24Z

@praveenkumar: Closing this issue.

In response to this:

Thanks for the issue, if it still exist please create new one with latest version of crc.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rwmjones added kind/bug Something isn't working status/need triage labels Aug 8, 2022

praveenkumar mentioned this issue Aug 9, 2022

[BUG] Check DNS query from host... Failed on Windows 10 #3295

Open

stale bot added the status/stale Issue went stale; did not receive attention or no reply from the OP label Oct 29, 2022

praveenkumar mentioned this issue Dec 5, 2022

[BUG]: Can't get 'crc start' to work on Fedora 37 workstation. Just spins at -> oc get nodes --context admin --cluster crc --kubeconfig /opt/kubeconfig #3442

Closed

nichjones1 added this to Project planning: crc Aug 8, 2023

nichjones1 moved this to Backlog in Project planning: crc Aug 8, 2023

nichjones1 removed status/stale Issue went stale; did not receive attention or no reply from the OP status/need triage labels Sep 6, 2023

openshift-ci bot closed this as completed Sep 6, 2023

github-project-automation bot moved this from Backlog to Done in Project planning: crc Sep 6, 2023

[BUG] Failed to renew TLS certificates #3294

[BUG] Failed to renew TLS certificates #3294

Comments

rwmjones commented Aug 8, 2022

General information

CRC version

CRC status

CRC config

Host Operating System

Steps to reproduce

Expected

Actual

Logs

rwmjones commented Aug 8, 2022

praveenkumar commented Aug 8, 2022

jsliacan commented Aug 8, 2022 • edited Loading

rwmjones commented Aug 8, 2022

gbraad commented Aug 9, 2022

rwmjones commented Aug 9, 2022 • edited Loading

pkesseli commented Aug 9, 2022

praveenkumar commented Aug 9, 2022

pkesseli commented Aug 9, 2022

gbraad commented Aug 9, 2022

gbraad commented Aug 9, 2022 • edited Loading

rwmjones commented Aug 9, 2022

gbraad commented Aug 9, 2022

ryanj commented Aug 9, 2022 • edited Loading

praveenkumar commented Aug 10, 2022

praveenkumar commented Aug 18, 2022

rwmjones commented Aug 18, 2022

rwmjones commented Aug 18, 2022

rwmjones commented Aug 18, 2022

praveenkumar commented Aug 18, 2022

rwmjones commented Aug 18, 2022

ryanj commented Aug 18, 2022 • edited Loading

praveenkumar commented Aug 19, 2022

praveenkumar commented Aug 19, 2022

rwmjones commented Aug 19, 2022

praveenkumar commented Aug 19, 2022

rwmjones commented Aug 19, 2022

praveenkumar commented Aug 19, 2022

rwmjones commented Aug 19, 2022

rwmjones commented Aug 19, 2022

praveenkumar commented Aug 19, 2022 • edited Loading

rwmjones commented Aug 19, 2022

rwmjones commented Aug 19, 2022

praveenkumar commented Aug 19, 2022

rwmjones commented Aug 19, 2022

rwmjones commented Aug 20, 2022

gbraad commented Aug 23, 2022

praveenkumar commented Aug 23, 2022

robertxgray commented Aug 23, 2022

praveenkumar commented Aug 23, 2022

robertxgray commented Aug 23, 2022

praveenkumar commented Aug 23, 2022

robertxgray commented Aug 23, 2022

praveenkumar commented Aug 23, 2022

stale bot commented Oct 29, 2022

praveenkumar commented Sep 6, 2023

openshift-ci bot commented Sep 6, 2023

jsliacan commented Aug 8, 2022 •

edited

Loading

rwmjones commented Aug 9, 2022 •

edited

Loading

gbraad commented Aug 9, 2022 •

edited

Loading

ryanj commented Aug 9, 2022 •

edited

Loading

ryanj commented Aug 18, 2022 •

edited

Loading

praveenkumar commented Aug 19, 2022 •

edited

Loading