Skip to content

[PGPRO-5691] move mmapped ptrack map into shared postgres memory #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ OBJS = ptrack.o datapagemap.o engine.o $(WIN32RES)
PGFILEDESC = "ptrack - block-level incremental backup engine"

EXTENSION = ptrack
EXTVERSION = 2.2
DATA = ptrack--2.1.sql ptrack--2.0--2.1.sql ptrack--2.1--2.2.sql
EXTVERSION = 2.3
DATA = ptrack--2.1.sql ptrack--2.0--2.1.sql ptrack--2.1--2.2.sql ptrack--2.2--2.3.sql

TAP_TESTS = 1

Expand Down
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ postgres=# CREATE EXTENSION ptrack;

## Configuration

The only one configurable option is `ptrack.map_size` (in MB). Default is `-1`, which means `ptrack` is turned off. In order to reduce number of false positives it is recommended to set `ptrack.map_size` to `1 / 1000` of expected `PGDATA` size (i.e. `1000` for a 1 TB database).
The only one configurable option is `ptrack.map_size` (in MB). Default is `0`, which means `ptrack` is turned off. In order to reduce number of false positives it is recommended to set `ptrack.map_size` to `1 / 1000` of expected `PGDATA` size (i.e. `1000` for a 1 TB database).

To disable `ptrack` and clean up all remaining service files set `ptrack.map_size` to `0`.

Expand All @@ -74,7 +74,7 @@ Usage example:
postgres=# SELECT ptrack_version();
ptrack_version
----------------
2.2
2.3
(1 row)

postgres=# SELECT ptrack_init_lsn();
Expand Down Expand Up @@ -115,27 +115,34 @@ Usually, you have to only install new version of `ptrack` and do `ALTER EXTENSIO

Since version 2.2 we use a different algorithm for tracking changed pages. Thus, data recorded in the `ptrack.map` using pre 2.2 versions of `ptrack` is incompatible with newer versions. After extension upgrade and server restart old `ptrack.map` will be discarded with `WARNING` and initialized from the scratch.

#### Upgrading from 2.2.* to 2.3.*:

* Stop your server
* Update ptrack binaries
* Remove global/ptrack.map.mmap if it exist in server data directory
* Start server
* Do `ALTER EXTENSION 'ptrack' UPDATE;`.

## Limitations

1. You can only use `ptrack` safely with `wal_level >= 'replica'`. Otherwise, you can lose tracking of some changes if crash-recovery occurs, since [certain commands are designed not to write WAL at all if wal_level is minimal](https://www.postgresql.org/docs/12/populate.html#POPULATE-PITR), but we only durably flush `ptrack` map at checkpoint time.

2. The only one production-ready backup utility, that fully supports `ptrack` is [pg_probackup](https://github.com/postgrespro/pg_probackup).

3. Currently, you cannot resize `ptrack` map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup. See [TODO](#TODO) for details.
3. You cannot resize `ptrack` map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup.

4. You will need up to `ptrack.map_size * 3` of additional disk space, since `ptrack` uses two additional temporary files for durability purpose. See [Architecture section](#Architecture) for details.
4. You will need up to `ptrack.map_size * 2` of additional disk space, since `ptrack` uses additional temporary file for durability purpose. See [Architecture section](#Architecture) for details.

## Benchmarks

Briefly, an overhead of using `ptrack` on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incremental `ptrack` backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More details [here](benchmarks).

## Architecture

We use a single shared hash table in `ptrack`, which is mapped in memory from the file on disk using `mmap`. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enough `ptrack.map_size`.
We use a single shared hash table in `ptrack`. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enough `ptrack.map_size`.

All reads/writes are made using atomic operations on `uint64` entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access and cannot control `mmap` eviction back to disk, `ptrack` keeps a map (`ptrack.map`) since the last checkpoint intact and uses up to 2 additional temporary files:
All reads/writes are made using atomic operations on `uint64` entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access, `ptrack` keeps a map (`ptrack.map`) since the last checkpoint intact and uses up to 1 additional temporary file:

* working copy `ptrack.map.mmap` for doing `mmap` on it (there is a [TODO](#TODO) item);
* temporary file `ptrack.map.tmp` to durably replace `ptrack.map` during checkpoint.

Map is written on disk at the end of checkpoint atomically block by block involving the CRC32 checksum calculation that is checked on the next whole map re-read after crash-recovery or restart.
Expand Down Expand Up @@ -165,8 +172,6 @@ Available test modes (`MODE`) are `basic` (default) and `paranoia` (per-block ch

### TODO

* Use POSIX `shm_open()` instead of `open()` to do not create an additional working copy of `ptrack` map file.
* Should we introduce `ptrack.map_path` to allow `ptrack` service files storage outside of `PGDATA`? Doing that we will avoid patching PostgreSQL binary utilities to ignore `ptrack.map.*` files.
* Can we resize `ptrack` map on restart but keep the previously tracked changes?
* Can we resize `ptrack` map dynamicaly?
* Can we write a formal proof, that we never loose any modified page with `ptrack`? With TLA+?
Loading