Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to renew TLS certificates #3294

Closed
rwmjones opened this issue Aug 8, 2022 · 47 comments
Closed

[BUG] Failed to renew TLS certificates #3294

rwmjones opened this issue Aug 8, 2022 · 47 comments
Labels
kind/bug Something isn't working

Comments

@rwmjones
Copy link

rwmjones commented Aug 8, 2022

General information

  • OS: Linux, Fedora 36
  • Hypervisor: crc is running on baremetal, KVM is available
  • Did you run crc setup before starting it YES
  • Running CRC on: Baremetal-Server

CRC version

CRC version: 2.6.0+d606e64
OpenShift version: 4.10.22
Podman version: 4.1.0

CRC status

DEBU CRC version: 2.6.0+d606e64                   
DEBU OpenShift version: 4.10.22                   
DEBU Podman version: 4.1.0                        
DEBU Running 'crc status'                         
DEBU Checking file: /home/rjones/.crc/machines/crc/.crc-exist 
DEBU Checking file: /home/rjones/.crc/machines/crc/.crc-exist 
DEBU Found binary path at /home/rjones/.crc/bin/crc-driver-libvirt 
DEBU Launching plugin server for driver libvirt   
DEBU Plugin server listening at address 127.0.0.1:42591 
DEBU () Calling .GetVersion                       
DEBU Using API Version 1                          
DEBU () Calling .SetConfigRaw                     
DEBU () Calling .GetMachineName                   
DEBU (crc) Calling .GetBundleName                 
DEBU (crc) Calling .GetState                      
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Fetching VM..." 
DEBU (crc) Calling .GetIP                         
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="GetIP called for crc" 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="IP address: 192.168.130.11" 
DEBU (crc) Calling .GetIP                         
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="GetIP called for crc" 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Getting current state..." 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="IP address: 192.168.130.11" 
DEBU Running SSH command: df -B1 --output=size,used,target /sysroot | tail -1 
DEBU Using ssh private keys: [/home/rjones/.crc/machines/crc/id_ecdsa /home/rjones/.crc/cache/crc_libvirt_4.10.22_amd64/id_ecdsa_crc] 
DEBU SSH command results: err: <nil>, output: 32737570816 12522758144 /sysroot 
DEBU cannot get OpenShift status: stat /home/rjones/.crc/machines/crc/kubeconfig: no such file or directory 
DEBU Making call to close driver server           
DEBU (crc) Calling .Close                         
DEBU Successfully made call to close driver server 
DEBU Making call to close connection to plugin binary 
DEBU (crc) DBG | time="2022-08-08T14:03:18+01:00" level=debug msg="Closing plugin on server side" 
CRC VM:          Running
OpenShift:       Unreachable (v4.10.22)
Podman:          
Disk Usage:      12.52GB of 32.74GB (Inside the CRC VM)
Cache Usage:     17.09GB
Cache Directory: /home/rjones/.crc/cache

CRC config

- consent-telemetry                     : no

Host Operating System

# Put the output of `cat /etc/os-release` in case of Linux
NAME="Fedora Linux"
VERSION="36 (Server Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server

Steps to reproduce

  1. crc delete -f
  2. crc cleanup
  3. crc setup
  4. crc start --log-level debug

Expected

CRC should work, I guess?

Actual

Failed to renew TLS certificates: please check if a newer CRC release is available: Failed to get all certificate signing requests: ssh command error:
command : timeout 5s oc get csr -ojson --context admin --cluster crc --kubeconfig /opt/kubeconfig

This error happens every time I try to use crc.

Logs

https://gist.github.com/rwmjones/3a58df5c478e11e003455243cce0d8f9

@rwmjones rwmjones added kind/bug Something isn't working status/need triage labels Aug 8, 2022
@rwmjones
Copy link
Author

rwmjones commented Aug 8, 2022

As a general question, is there some particular Linux platform where crc might work better? It seems really broken on Fedora 36. Maybe it's better to use CentOS or RHEL?

@praveenkumar
Copy link
Member

@rwmjones It should work with F-36, we (most of crc dev) are using f-36 day to day. Looks like the disk image (bundle) we ship with release have cert expired and regeneration of it is now failing. During our testing we didn't see this issue but we will again check and get back to you.

@jsliacan
Copy link

jsliacan commented Aug 8, 2022

On F36 I did 3 starts in a row.

I also ran e2e test once, and it passed.

@rwmjones
Copy link
Author

rwmjones commented Aug 8, 2022

Maybe a hardware speed thing? I've done about half a dozen restarts and they all failed in the same way.

@gbraad
Copy link
Contributor

gbraad commented Aug 9, 2022

Can you share your hardware configuration?

@rwmjones
Copy link
Author

rwmjones commented Aug 9, 2022

Sure. It's an Intel NUC RNUC11PAHi50000 which is a fairly standard 4 core / 8 thread 11th Gen Intel Core i5 mobile chipset. This has Fedora 36 installed on it, and all crc commands are run directly on the baremetal. (The crc-created VM therefore runs as an L1 guest) Edit: 16GB RAM and 1TB hard disk.

@pkesseli
Copy link

pkesseli commented Aug 9, 2022

For what it's worth, since I ran into this issue with release 2.6.0, I tried the previous release 2.5.1, and ran into the same issue there:

INFO Starting OpenShift kubelet service
DEBU Using root access: Executing systemctl daemon-reload command
DEBU Running SSH command: sudo systemctl daemon-reload
DEBU SSH command results: err: <nil>, output:
DEBU Using root access: Executing systemctl start kubelet
DEBU Running SSH command: sudo systemctl start kubelet
DEBU SSH command results: err: <nil>, output:
INFO Kubelet serving certificate has expired, waiting for automatic renewal... [will take up to 8 minutes]
DEBU retry loop: attempt 0
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU retry loop: attempt 1
...
DEBU retry loop: attempt 55
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU retry loop: attempt 56
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds
DEBU SSH command results: err: <nil>, output: 2022-07-22T06:30:41+00:00
DEBU Certs have expired, they were valid till: 22 Jul 22 06:30 +0000
DEBU error: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired - sleeping 5s
DEBU RetryAfter timeout after 57 tries
DEBU Bundle has been generated 48 days ago
DEBU Making call to close driver server
DEBU (crc) Calling .Close
DEBU (crc) DBG | time="2022-08-09T12:41:17+02:00" level=debug msg="Closing plugin on server side"
DEBU Successfully made call to close driver server
DEBU Making call to close connection to plugin binary
Failed to renew TLS certificates: please check if a newer CRC release is available: Temporary error: certificate /var/lib/kubelet/pki/kubelet-server-current.pem still expired (x57)

@praveenkumar
Copy link
Member

@rwmjones Do you have another hardware where you can run crc (either remote) to see if it that really something around hardware because during my testing (on GCP on a nested virt enabled VM) it took more time then usual but cluster did come up.

@pkesseli
Copy link

pkesseli commented Aug 9, 2022

@rwmjones @gbraad Can we run the certificate renewal manually with custom timeouts to work around this issue?

@gbraad
Copy link
Contributor

gbraad commented Aug 9, 2022

It is our biggest issue as this is not possible OOTB.
Praveen reserves time to look into this for this sprint.

@gbraad
Copy link
Contributor

gbraad commented Aug 9, 2022

NUC and the other system which also filed an issue was a SFF (Small form factor) so this might be related to cpu throttling due to thermals, however Praveen also saw this on GCP now. Looks like a timing/time out issue. But we didn't see this before. We have automated tests in place to confirm this works, but non in resource constrained environments?

@rwmjones
Copy link
Author

rwmjones commented Aug 9, 2022

I dont really have other hardware to run this on. Re: Can we run the certificate renewal manually with custom timeouts to work around this issue? - how to do that?

@gbraad
Copy link
Contributor

gbraad commented Aug 9, 2022

how to do that?

There is no mechanism AFAIK to trigger this or define other timeouts.
You could try to give a little bit more memory instead?

@ryanj
Copy link

ryanj commented Aug 9, 2022

I'm hitting the same error with crc start using Fedora 36 on a ThinkPad with 64GB of RAM

@praveenkumar
Copy link
Member

I would suggest to wait till next week so we have new version with updated bundle certificate.

@praveenkumar
Copy link
Member

@rwmjones @ryanj We just released the new version of CRC, can you please try that and let us know if you hit another issues?

@rwmjones
Copy link
Author

I'm trying crc-linux-2.7.1-amd64 now.

@rwmjones
Copy link
Author

BTW it always says:

WARN Cannot add pull secret to keyring: The name is not activatable 

However I'm not sure if this causes a problem.

@rwmjones
Copy link
Author

I still get 401 Unauthorized errors logging in as kubeadmin, either through the web interface or with oc login. That's what I was getting before too. crc status says:

CRC VM:          Running
OpenShift:       Unreachable (v4.11.0)
Podman:          
Disk Usage:      14.74GB of 32.74GB (Inside the CRC VM)
Cache Usage:     34.18GB
Cache Directory: /home/rjones/.crc/cache

Some parts (eg. the web interface) are running if I log in as developer.
So ... unclear.

@praveenkumar
Copy link
Member

@rwmjones which means stat /home/rjones/.crc/machines/crc/kubeconfig: no such file or directory which you get from the crc status --log-level debug can you share the crc start --log-level debug to a gist?

@rwmjones
Copy link
Author

$ ls -l /home/rjones/.crc/machines/crc/kubeconfig
ls: cannot access '/home/rjones/.crc/machines/crc/kubeconfig': No such file or directory

Did you want me to just run the crc start command, or to delete and rebuild the whole VM? Anyway the output with just crc start is: https://paste.centos.org/view/287c11e2

@ryanj
Copy link

ryanj commented Aug 18, 2022

Here is a copy of my crc start --log-level debug output from crc-2-7-1: https://gist.github.com/ryanj/f513967816196d36cf94b4164c2f1748

^ This is on a 64GB laptop with 32000 memory and 8 cpus allocated in the crc config

It seems to work correctly, but reports a failure and complains about not reaching it's intended startup state within 10 mins...

@praveenkumar
Copy link
Member

Here is a copy of my crc start --log-level debug output from crc-2-7-1: https://gist.github.com/ryanj/f513967816196d36cf94b4164c2f1748

^ This is on a 64GB laptop with 32000 memory and 8 cpus allocated in the crc config

It seems to work correctly, but reports a failure and complains about not reaching it's intended startup state within 10 mins...

@ryanj Sometime some of operator doesn't able to reconcile till 10 mins but for you cluster is healthy and running state.

The issue I see with @rwmjones is the hardware which is NUC and might take longer than usual but I still don't understand why the kubeconfig file is not updated in the /home/rjones/.crc/machines/crc/kubeconfig.

@praveenkumar
Copy link
Member

$ ls -l /home/rjones/.crc/machines/crc/kubeconfig
ls: cannot access '/home/rjones/.crc/machines/crc/kubeconfig': No such file or directory

Did you want me to just run the crc start command, or to delete and rebuild the whole VM? Anyway the output with just crc start is: https://paste.centos.org/view/287c11e2

@rwmjones Can you please execute following and let us know the output of debug level?

$ crc delete
$ crc cleanup
$ crc setup --log-level debug
$ crc start --log-level debug

@rwmjones
Copy link
Author

@praveenkumar
Copy link
Member

@rwmjones Thanks, so as per logs looks like apiserver is not even able to available during allocated time and that cause kubeconfig file not even present in respective directory, this is really looks like hardware related, even we add more wait time the overall performance of this cluster might not suitable for workloads :(

level=debug msg="retry loop: attempt 0"
level=debug msg="Running SSH command: timeout 5s oc get nodes --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: Process exited with status 124, output: "

@rwmjones
Copy link
Author

Can we make the timeouts longer or configurable? The machine has 16G of RAM and is not swapping.

@praveenkumar
Copy link
Member

@rwmjones I just created a custom linux binary with increased time can you try that please delete the cluster before starting using this binary.

$ curl -L -O https://github.com/praveenkumar/crc/releases/download/1.21.0/crc 
$ chmod +x crc
$ ./crc delete
$ ./crc start --log-level debug

This binary uses following patch (remove fast failure and increase the overall retry time)

diff --git a/pkg/crc/cluster/cluster.go b/pkg/crc/cluster/cluster.go
index 0f5009c1..f8bc9674 100644
--- a/pkg/crc/cluster/cluster.go
+++ b/pkg/crc/cluster/cluster.go
@@ -413,7 +413,7 @@ func WaitForRequestHeaderClientCaFile(ctx context.Context, sshRunner *ssh.Runner
 func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
        logging.Info("Waiting for kube-apiserver availability... [takes around 2min]")
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "nodes")
+               stdout, stderr, err := ocConfig.RunOcCommand("get", "nodes")
                if err != nil {
                        logging.Debug(stderr)
                        return &errors.RetriableError{Err: err}
@@ -421,7 +421,7 @@ func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
                logging.Debug(stdout)
                return nil
        }
-       return errors.Retry(ctx, 4*time.Minute, waitForAPIServer, time.Second)
+       return errors.Retry(ctx, 10*time.Minute, waitForAPIServer, time.Second)
 }
 
 func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error {
@@ -431,7 +431,7 @@ func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error
 
        deleteOpenshiftAPIServerPods := func() error {
                cmdArgs := []string{"delete", "pod", "--all", "--force", "-n", "openshift-apiserver"}
-               _, stderr, err := ocConfig.WithFailFast().RunOcCommand(cmdArgs...)
+               _, stderr, err := ocConfig.RunOcCommand(cmdArgs...)
                if err != nil {
                        return &errors.RetriableError{Err: fmt.Errorf("Failed to delete pod from openshift-apiserver namespace %v: %s", err, stderr)}
                }

@rwmjones
Copy link
Author

I tried it twice and it failed both times. gist from the second attempt: https://gist.github.com/rwmjones/084d4abd35e76a4c8b7eab7b7c42b53d

I don't think the change to the timeout had any effect since it appeared to only wait 4 mins.

Interestingly I tried going inside the VM while it was starting. The VM is only using half available RAM (8GB). I think it could easily be larger. It also only has half available cores (4). However it's not swapping, although it is doing a very large amount of I/O and kube-apiserver is using lots of CPU (along with various other processes, crio, kubelet, etcd, systemd, python3, ...)

I think you could give the VM something like total host system RAM - 4 GB, and total host pCPUs - 2, or something like that.

@rwmjones
Copy link
Author

loadavg inside the VM, several minutes after crc start gave up:
top - 11:28:22 up 7 min, 1 user, load average: 7.12, 5.46, 2.53

@praveenkumar
Copy link
Member

praveenkumar commented Aug 19, 2022

@rwmjones it does work from node side but then failed for getting configmaps which is again using waitfailfast need to remove that and updated the binary again, please remove the old one and refetch it again.

level=debug msg="Running SSH command: timeout 30s oc get nodes --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: <nil>, output: NAME                 STATUS   ROLES           AGE   VERSION\ncrc-9jm8r-master-0   Ready    master,worker   11d   v1.24.0+9546431\n"
level=debug msg="NAME                 STATUS   ROLES           AGE   VERSION\ncrc-9jm8r-master-0   Ready    master,worker   11d   v1.24.0+9546431\n"
level=debug msg="Waiting for availability of resource type 'cm'"

Interestingly I tried going inside the VM while it was starting. The VM is only using half available RAM (8GB). I think it could easily be larger. It also only has half available cores (4). However it's not swapping, although it is doing a very large amount of I/O and kube-apiserver is using lots of CPU (along with various other processes, crio, kubelet, etcd, systemd, python3, ...)

yes it is also going to use the memory once all the operator is up and running. Also you can use following to provide more ram and cpu to system but make sure you delete existing instance first

$ crc start -m 12000 -c 6 --log-level debug

This time patch is

diff --git a/pkg/crc/cluster/cluster.go b/pkg/crc/cluster/cluster.go
index 0f5009c1..f8bc9674 100644
--- a/pkg/crc/cluster/cluster.go
+++ b/pkg/crc/cluster/cluster.go
@@ -413,7 +413,7 @@ func WaitForRequestHeaderClientCaFile(ctx context.Context, sshRunner *ssh.Runner
 func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
        logging.Info("Waiting for kube-apiserver availability... [takes around 2min]")
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "nodes")
+               stdout, stderr, err := ocConfig.RunOcCommand("get", "nodes")
                if err != nil {
                        logging.Debug(stderr)
                        return &errors.RetriableError{Err: err}
@@ -421,7 +421,7 @@ func WaitForAPIServer(ctx context.Context, ocConfig oc.Config) error {
                logging.Debug(stdout)
                return nil
        }
-       return errors.Retry(ctx, 4*time.Minute, waitForAPIServer, time.Second)
+       return errors.Retry(ctx, 10*time.Minute, waitForAPIServer, time.Second)
 }
 
 func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error {
@@ -431,7 +431,7 @@ func DeleteOpenshiftAPIServerPods(ctx context.Context, ocConfig oc.Config) error
 
        deleteOpenshiftAPIServerPods := func() error {
                cmdArgs := []string{"delete", "pod", "--all", "--force", "-n", "openshift-apiserver"}
-               _, stderr, err := ocConfig.WithFailFast().RunOcCommand(cmdArgs...)
+               _, stderr, err := ocConfig.RunOcCommand(cmdArgs...)
                if err != nil {
                        return &errors.RetriableError{Err: fmt.Errorf("Failed to delete pod from openshift-apiserver namespace %v: %s", err, stderr)}
                }
diff --git a/pkg/crc/cluster/csr.go b/pkg/crc/cluster/csr.go
index 9ed5e78a..181ef781 100644
--- a/pkg/crc/cluster/csr.go
+++ b/pkg/crc/cluster/csr.go
@@ -16,7 +16,7 @@ import (
 func WaitForOpenshiftResource(ctx context.Context, ocConfig oc.Config, resource string) error {
        logging.Debugf("Waiting for availability of resource type '%s'", resource)
        waitForAPIServer := func() error {
-               stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", resource)
+               stdout, stderr, err := ocConfig.RunOcCommand("get", resource)
                if err != nil {
                        logging.Debug(stderr)
                        return &crcerrors.RetriableError{Err: err}
@@ -47,7 +47,7 @@ func getCSRList(ctx context.Context, ocConfig oc.Config, expectedSignerName stri
        if err := WaitForOpenshiftResource(ctx, ocConfig, "csr"); err != nil {
                return nil, err
        }
-       output, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "csr", "-ojson")
+       output, stderr, err := ocConfig.RunOcCommand("get", "csr", "-ojson")
        if err != nil {
                return nil, fmt.Errorf("Failed to get all certificate signing requests: %v %s", err, stderr)
        }

@rwmjones
Copy link
Author

@rwmjones
Copy link
Author

I can confirm that this time the VM was created with 12GB RAM and 6 cores.

@praveenkumar
Copy link
Member

level=debug msg="Running SSH command: timeout 30s oc get cm --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: <nil>, output: NAME                       DATA   AGE\nkube-root-ca.crt           1      11d\nopenshift-service-ca.crt   1      11d\n"
level=debug msg="NAME                       DATA   AGE\nkube-root-ca.crt           1      11d\nopenshift-service-ca.crt   1      11d\n"
level=debug msg="Running SSH command: timeout 30s oc delete -n openshift-machine-config-operator cm machine-config-controller --context admin --cluster crc --kubeconfig /opt/kubeconfig"
level=debug msg="SSH command results: err: Process exited with status 124, output: "

@rwmjones It went one step forward but then again due to slow IO/proess it fail again, Now I am out of idea around how it can work :(

@rwmjones
Copy link
Author

I have ordered more RAM.

@rwmjones
Copy link
Author

I have upgraded the machine to 64 GB of RAM, the maximum possible for this NUC hardware. Surprisingly the default VM created is still 8 GB / 4 cores, I would have expected it to depend on the available host memory and cores in some way. My initial attempt to start crc failed as before, probably because of this.

So I used:

crc start -m 50000 -c 6 --log-level debug

It basically fails in the same way as far as I can tell: https://gist.github.com/rwmjones/0c48232408e7396b43a4cdbc64ded877

@gbraad
Copy link
Contributor

gbraad commented Aug 23, 2022

@praveenkumar this might be related to the auth not becoming available in time?

@praveenkumar
Copy link
Member

@praveenkumar this might be related to the auth not becoming available in time?

@gbraad No, as per logs it is failing long before that.

@robertxgray
Copy link

Hi. For the last two weeks I've been trying to use crc while suffering the same problems as @rwmjones. Renewal of certificates included.

During the last tests, I found with horror that my whole home folder was being shared with the crc VM. Because of this, I have created a new user dedicated to run crc, and incidentally this seems to have solved the problems.

Also, the new user's home is located in a smaller yet faster drive. Not sure if that's related.

I'm using quite modest hardware: i5-4670 CPU @ 3.40GHz (4 core / 4 thread), 16 GB RAM, Fedora 35.

@praveenkumar
Copy link
Member

@robertxgray with latest version of CRC you shouldn't see certificate renewal and file share support is added to crc 2.7.1. Are you seeing this issue with 2.7.1 or with older version?

@robertxgray
Copy link

@praveenkumar Sorry for the misunderstanding. I mean I've had the same errors as rwmjones since the beginning of this thread, but certificate renewal issues were gone after updating to 2.7.1 as expected.

@praveenkumar
Copy link
Member

@robertxgray But to make 2.7.1 works you had to created a separate user because of home folder is being shared with CRC VM? I want to figure out if this need a different bug and we missed some corner case.

@robertxgray
Copy link

@praveenkumar I created another user because I didn't want CRC to mess with all the junk stored in my main user's home folder. CRC being able to start with the new user was a nice and unexpected side effect. Before that, I has having the same errors shown in rwmjones' latest logs.

I have performed some additional tests moving the new user's home folder to the slow hard drive and CRC still works. Sometimes I get: ERRO Cluster is not ready: cluster operators are still not stable after...
But crc status shows OpenShift as Starting and it changes to Running a few minutes later.

@praveenkumar
Copy link
Member

@robertxgray Thank you for confirming.

@stale
Copy link

stale bot commented Oct 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale Issue went stale; did not receive attention or no reply from the OP label Oct 29, 2022
@nichjones1 nichjones1 moved this to Backlog in Project planning: crc Aug 8, 2023
@nichjones1 nichjones1 removed status/stale Issue went stale; did not receive attention or no reply from the OP status/need triage labels Sep 6, 2023
@praveenkumar
Copy link
Member

Thanks for the issue, if it still exist please create new one with latest version of crc.

/close

@openshift-ci openshift-ci bot closed this as completed Sep 6, 2023
@openshift-ci
Copy link

openshift-ci bot commented Sep 6, 2023

@praveenkumar: Closing this issue.

In response to this:

Thanks for the issue, if it still exist please create new one with latest version of crc.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-project-automation github-project-automation bot moved this from Backlog to Done in Project planning: crc Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

8 participants