Skip to content

Reduce WAL pressure #3055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benjamin-bergia opened this issue Feb 23, 2022 · 13 comments
Closed

Reduce WAL pressure #3055

benjamin-bergia opened this issue Feb 23, 2022 · 13 comments
Labels

Comments

@benjamin-bergia
Copy link

I am currently struggling to run timescaledb on my instances. I have access to storage with a limited bandwidth and the huge amount of WALs that my instances generate with wal_level = 'logical' just doesn't cut it, despite using a separate PVC. I have pretty lax requirements. I do not need incremental backups and I can afford to lose the data between two full backups (daily full backup). I also don't use any standby, replica, etc.

After trying to set wal_level in the cluster definition using patroni.dynamicConfiguration, I noticed that it is overwritten. Is it possilble to have basic log archiving using the minimal log level and still have full backups?

@tjmoore4
Copy link
Contributor

Hello @benjamin-bergia. It appears you are using PGO v5, but other details would be helpful as well. Your exact PGO version, your PostgresCluster definition and environment details (Kubernetes version, etc) would give a better idea of what options may be available.

@tjmoore4 tjmoore4 added the v5 label Feb 25, 2022
@benjamin-bergia
Copy link
Author

Sure,

I am using:

And here is my cluster definition:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: metrics-postgres
spec:
  image: myregistry/myproject/crunchy-postgres@sha256:94c1162fe12cde697049168a0c8474a7983bb28404c025c7d358cb021c451090
  postgresVersion: 13
  instances:
    - name: primary
      dataVolumeClaimSpec:
        storageClassName: 'io'
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: '2000Gi'
      walVolumeClaimSpec:
        storageClassName: 'io'
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: '300Gi'
      tolerations:
        - key: 'reservedFor'
          value: 'database'
          effect: 'NoSchedule'

  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      global:
        archive-async: 'y'
        spool-path: '/pgwal/pgbackrest/spool'
        archive-push-queue-max: 200GB
        process-max: '4'
        repo1-path: '/pgbackrest/metrics-db/metrics-db'
        repo1-retention-full: '3'
        repo1-retention-full-type: time
        repo1-retention-archive-type: 'full'
        repo1-retention-archive: '1'
      configuration:
        - secret:
            name: ***
      repos:
        - name: repo1
          s3:
            endpoint: '***'
            region: '***'
            bucket: '***'
          schedules:
            full: '0 0/12 * * *'

  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.4-0

  users:
    - name: promscale
      options: 'SUPERUSER'

  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          shared_preload_libraries: timescaledb,promscale
          timescaledb.license: timescale
          # Generated by timescaledb-tune
          shared_buffers: 14714MB
          effective_cache_size: 44143MB
          maintenance_work_mem: 2047MB
          work_mem: 4708kB
          timescaledb.max_background_workers: 8
          max_worker_processes: 43
          max_parallel_maintenance_workers: 12
          max_parallel_workers_per_gather: 16
          max_parallel_workers: 32
          wal_buffers: 16MB
          min_wal_size: 512MB
          max_wal_size: 4GB
          default_statistics_target: 500
          random_page_cost: 1.1
          checkpoint_completion_target: 0.9
          max_connections: 100
          max_locks_per_transaction: 512
          autovacuum_max_workers: 10
          autovacuum_naptime: 10
          effective_io_concurrency: 256
          log_checkpoints: t
          wal_level: replica

As explained in here, I am running a custom image based on the crunchy one just to switch the license and add the promscale extension.

@cbandy
Copy link
Member

cbandy commented Mar 9, 2022

Is it possilble to have basic log archiving using the minimal log level and still have full backups?

Quoting pgBackRest,

No. wal_level > minimal is absolutely required for online backups.

@benjaminjb
Copy link
Contributor

benjaminjb commented Jun 6, 2022

@benjamin-bergia I hope that ^ answered your question re: the requirement that wal_level > minimal in order to have a full backup.

I hope you have solved the issue you were experiencing -- perhaps wal_level = replica would help? -- and if this is still an issue and you would like to discuss some other possible solution, feel free to reopen this issue.

@ThommyH
Copy link

ThommyH commented Oct 18, 2022

Setting wal_level = replica in spec.patroni.dynamicConfiguration is being ignored. Is wal_level = logical necessary for pgo to work properly?

@MSandro
Copy link

MSandro commented Sep 13, 2024

I have the same issue, I have no idea how to reduce the wal size.

@ThommyH
Copy link

ThommyH commented Sep 13, 2024

spool-path: /pgdata/pgbackrest-spool
archive-async: 'y'
archive-push-queue-max: 100GiB

Try these pgbackrest settings

edit: added spool path

@andrewlecuyer
Copy link
Collaborator

Concur with @ThommyH's advice for async archiving. This will actually be a default in upcoming CPK releases: #3962.

@dberardo-com
Copy link

@benjamin-bergia have you tried wal_compression ? if so, any luck with it ?

i am also facing issue with very high wal log generation and wondering if it is possible to instruct pgbackrest to expirre wal logs between backups as i dont care about PITR.

also, have you had any benefit in using async archiving ? or was that just an attempt to improve the situation that lead nowhere?

@benjamin-bergia
Copy link
Author

@dberardo-com I haven't tried. Currently I am not using this operator but at the time, the storage that was available to me couldn't handle the amount of IO from the WALs. So all these settings were just there to try to mitigate this.

@dberardo-com
Copy link

Thanks for your comment, hopefully the chruncy team will come up with some official guides in how to reduce this Wal burden.

@benjamin-bergia
Copy link
Author

AFAIK it's a limitation on pgbackrest and as such pretty much out of the hands of Crunchy.

@dberardo-com
Copy link

is there any way to change wal_level to replica without having to pause the cluster ? currently i have a cluster running with wal_level replica and it's seems to do just fine ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants