-
Notifications
You must be signed in to change notification settings - Fork 611
No space left on device #3270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just had the same thing happen!? I just had my crunchy db database I spun up with the crunchy db operator with 10G attached storage run out of pvc storage space and I had to expand the pvc. I don't have very much data in any of the tables. Has anyone run into a fairly empty crunchy db postgresql database take up over 10G of data? |
I am currently facing a similiar issue. I have a very small database ( ~60MB) but it has already filled a 1GB volume. All of the data are WAL files. I don't know why there are so many of them.
|
Well for what it's worth I have just got hit by this also. |
I fixed the problem by changing the following parameters: patroni:
dynamicConfiguration:
postgresql:
parameters:
max_wal_size: 128MB
wal_buffers: 2MB
wal_recycle: off
wal_init_zero: off |
Just my experiece so far on this as we also faced pg14_wal folder consuming all disk given to it, even though the DB has only 300mb There are certain parameters that we tried to change and were unable to as the - or what looks like - the reconcile loop calls the patroni to change them back. For instance, we tried to change the wal_level from logical to replica, in 2 different ways:
After looking around, we ended up finding #3055 and #3002, but had to dig into the code to see that it is mandatory postgres-operator/internal/postgres/parameters.go Lines 33 to 38 in 2e18aef
We also tried to change wal_log_hints, but it looks like it is also a prerequisite for something postgres-operator/internal/patroni/config.go Line 602 in 7241a02
The same seems to be valid for wal_keep_size Edit: few days after, turns out that our backup had issues and WAL files were not consumed then deleted, which was causing the issue. After the backup issue was fixed, 5 backup jobs failed with timeout to archive WAL, but at each run the job cleaned a lot of the WAL files. Last one was successful and pg14_wal went down from 35Gb to 17M |
We also hit this issue during import of a ~60GB large database. At its peak (looking at the Grafana dashboards) we found that the WAL log reached 80GB in size. The WAL might have been ended up more bloated than expected since we ended up running the import multiple times due to crashing when hitting "out of disk". |
this is also a problem in pgo v5. setting wal_level to replica is ignored, but should be able as mentioned here: #3055 (comment) |
I am not sure if I have understood your problem correctly. But you have to set pgbackrest retention management AND schedule at least one backup to get automatic archive retention management working.
|
Peanut gallery here. Couldn't some of the issues reported here be a side effect of not running a VACUUM pass often enough on the databases? Cheers. Ref: https://www.postgresql.org/docs/current/routine-vacuuming.html |
Possibly related: #2531 |
Considering the initial issue is for an older version of Crunchy Postgres for Kubernetes (v4.7), and the root cause for the various issues described in this thread are related to pgBackRest and/Postgres configuration & tuning (rather than anything with CPK itself), I am going to proceed with closing this. If anyone is still running into similar issues, please feel free to submit a new GitHub issue. Or you can also continue to the conversation in the PGO project community discord server. Thanks! |
Maybe the action here then is improving the docs around backups, perhaps adding this as a warning somewhere in a FAQ or something like that? |
is it possible to cap the maximum size of the wal folder so that old wal logs exceeding the size are removed? i guess this would invalidate all full backups taken prior to the delete logs, but it would at least prevent disk to fill. is there any way to achieve something similar ? |
I am working on version 4.7.4, I am facing an issue trying to make the cluster back to work again.
When I checked the error I found No space left on device I tried to resize the pvc but nothing has changed and still, the pod is not ready
Environment
Kubernetes
)4.7.4
)centos8
)13
)oci
Oracle cloud)here are the full logs, any help would be appreciated. Thanks
The text was updated successfully, but these errors were encountered: