Skip to content

Use docker initialization scripts for DB init #9993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

abitrolly
Copy link
Contributor

The behavior is explained here https://hub.docker.com/_/postgres

After the entrypoint calls initdb to create the default postgres user and database, it will run any *.sql files, run any executable *.sh scripts, and source any non-executable *.sh scripts found in that directory to do further initialization before starting the service.

Also fixes #5980.

@abitrolly
Copy link
Contributor Author

Added the named volume and rebased. PTAL.

@abitrolly
Copy link
Contributor Author

Interesting. The changes are approved, but workflow still doesn't run.

@ewjoachim
Copy link
Contributor

Yes, I'm a pypi moderator, but I'm not an admin of the repo, I don't have the right to unblock the workflow.

@FFY00
Copy link
Member

FFY00 commented Sep 23, 2021

You need commit access.

@ewjoachim ewjoachim requested review from di and ewdurbin September 23, 2021 16:51
@ewjoachim
Copy link
Contributor

ewjoachim commented Sep 23, 2021

I know :) But that's not something I have as of today :)
@di @ewdurbin if by any chance, you can launch the workflow :)

Copy link
Member

@ewdurbin ewdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. does this handle decompressing the example.sql.xz file?
  2. is there a mechanism to re-intialize, or does the Postgres container require being destroyed/recreated?

@abitrolly
Copy link
Contributor Author

does this handle decompressing the example.sql.xz file?

Yes, here.

https://github.com/docker-library/postgres/blob/740bb6d8d44a955df42d88fd5660c7043340db90/10/stretch/docker-entrypoint.sh#L168

is there a mechanism to re-intialize, or does the Postgres container require being destroyed/recreated?

DROP database then load the dump with .xz as before, but that's not how containers are supposed to work.

@abitrolly abitrolly requested a review from ewdurbin September 23, 2021 18:22
@abitrolly
Copy link
Contributor Author

abitrolly commented Sep 24, 2021

I probably need to clarify the effects of this PR.

  1. The DB initialization with creating warehouse DB and uploading initial data dump happens automattically when the db container starts without existing database, not waiting for the user to issue initdb. So when db becomes online, the only thing that needs to be run are migrations.
  2. Because initial dump is already imported, initdb now doesn't kill the database, allowing to run migrations multiple times safely. Actually, it can now be renamed to updatedb.
  3. resetdb command is added to do explicit resets to the state before migrations. This provides a better control for debugging migrations. The resetdb reuses the DB initialization mechanism from 1. to avoid maintenance of parallel SQL code that does the same thing. The DB initialization mechanism is triggered when the container starts without existing database on disk. We need to stop or kill the container to clear the volume without unexpected errors. It is possible to just stop the db container, but that signals that db container contents outside volumes is actively modified, which is not the case here.

@ewdurbin does that makes things look better? I can insert the SQL code back to resetdb if you need it, and then the separate PR will be needed to address #5980

@abitrolly
Copy link
Contributor Author

So where we go from here?

@di
Copy link
Member

di commented Dec 6, 2021

While this seems like a nice feature of the postgres image, I'm not sure I like that database initialization is now split.

Ideally all database initialization would be done automatically here, and we wouldn't need initdb at all, which means that we keep the dev database up-to-date with every new migration. This would require something like #7496 but also we'd need to make the dump much smaller, so as to not introduce a large file into the git history every time there is a migration.

@abitrolly
Copy link
Contributor Author

Database initialization is done automatically here. I could rename that to migratedb target as it is actually what the initdb target does at the moment. The only blocker is the line docker-compose run web python -m warehouse sponsors populate-db. I am not sure yet if it can be replaced with static import/export.

@abitrolly
Copy link
Contributor Author

Okay. Sponsors line is a technical debt from https://github.com/pypa/warehouse/pull/9512/files#diff-b912d97d6c38546cdc81c0c2cc35833412c3b8ccd46a62ac5767f030a35bef73 Because there are not pathways to Alembic migrations for Django users.

abitrolly and others added 6 commits April 15, 2022 14:38
Co-authored-by: Joachim Jablon <[email protected]>
This allows to restore after `docker-compose down` without
reuploading database and rerunning migrations.

Need to explicitly remove the volume to recreate DB from
scratch `docker volume rm warehouse_pgdata`.
@abitrolly abitrolly requested a review from a team as a code owner February 22, 2024 19:07
@di
Copy link
Member

di commented Apr 3, 2024

This is done in #15549.

@di di closed this Apr 3, 2024
@abitrolly
Copy link
Contributor Author

Finally. 3 years later, but it is done. I would consider learning from Godot best practices for supporting the community that sends patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants