faster s2i container start idea #220

praiskup · 2018-01-04T11:10:26Z

The current s2i proposal in #208 suffers from one ache, it is that even if user provides the initial database state in the "sql" dump to be restored after "initdb", it takes more than several seconds to get the database initialized.

I'm curious whether we could run initdb also during the run of assemble script, and than copy the data directory somewhere within the image -- IOW whether we could have the binary data directory baked into the built s2i image. Then, we could skip the initdb and just copy the backed directory under $PGDATA, and save a lot of time. WDYT?

omron93 · 2018-01-04T15:35:20Z

we could skip the initdb and just copy the backed directory under $PGDATA, and save a lot of time

It would save time. On the other hand the image could be really big and it would slow deployments down. But it's user choise, so why not.

praiskup · 2018-01-04T16:26:21Z

You mean the re-deployments, where the data directory is already initialized? Well, in such case you still have the sql dump file baked into the image, and that would be per-se large. If we baked the binary datadir into the image instead of the sql file, it is likely we would get even smaller image...

pkubatrh · 2018-01-05T07:38:33Z

I might be missing something but is the "sql restore" something we are supporting right now? Or is it a future use case using the new hooks that we could possibly make easier for the users to achieve?

praiskup · 2018-01-05T08:05:19Z

Discussion-only topic :-)! I would mark this with question label, if I could. I'm trying to find usecase for myself for #208.

Doing a development of python+postgresql project/app, my usecase would be:

for development purposes I need to get that app into initial state many times a day
I have the initial-state-sql-dump, which I can provide during s2i build (but it has about 60MB, db restore takes about one minute on my box)
s2i build (assemble) could go and initdb, import that ^^ dump (only once), and store the datadir content somewhere (but "drop" the original sql dump to save the space)
and when instantiating container from the image, instead of initdb+dump-restore we could simply copy the datadir from the image.

I don't know whether (a) i can to dhat right now with supported container, (b) the #208 is required, or (c) some other pull request is needed. I think (c) is right, but I'm not sure.

omron93 · 2018-01-09T10:07:31Z

You mean the re-deployments, where the data directory is already initialized? Well, in such case you still have the sql dump file baked into the image, and that would be per-se large. If we baked the binary datadir into the image instead of the sql file, it is likely we would get even smaller image...

I was thinking mainly about storing data in image. I don't know details but I think image is transferred over network several times during app lifetime (pushed into registry, pulled into each node where image is run,...). So the bigger the image is the slower that process is.

On the other hand if data are stored in persistent network volume data are transferred over network anyway. So maybe storing initial data in image isn't much slower:-)

Also Open Shift (Online) restricts size of used persistent storage. I haven't found note about restricting of images size, so maybe this is even advantage:-D

pkubatrh · 2018-01-09T10:26:28Z

In my opinion this does not sound like something that the image should be taking care of. More like work for Openshift itself (project backup?).

praiskup · 2018-01-09T10:36:15Z

@omron93

I was thinking mainly about storing data in image. I don't know details but I think image is transferred over network several times during app lifetime (pushed into registry, pulled into each node where image is run,...). So the bigger the image is the slower that process is.

If you have plain text sql file with default data baked into the image, the space requirements are asymptotically equivalent.

Of course, the db scenario might be that you fetch the data from the internet after db initialization, but that's not anymore task for s2i.

@pkubatrh

More like work for Openshift itself (project backup?).

Hms, maybe. Do you have a link?

pkubatrh · 2018-01-09T10:41:50Z

Hms, maybe. Do you have a link?

Nope, Im not sure if such a feature exists yet. It was just an idea on how it should ideally work.

pkubatrh · 2018-01-09T10:42:40Z

Quick search revealed:
https://docs.openshift.com/container-platform/3.6/admin_guide/backup_restore.html#project-backup

But that does not seems like something we would want (backs up only project configuration)

praiskup · 2018-01-09T10:54:27Z

Full project snapshot would be nice, but that doesn't help with the use-ase I described -- because even though I want to have "backed" the initial state of database, the rest of the project goes forward during development...

My thought on this is that we shouldn't support this directly, but it would be nice if we allowed users to implement this themselves (via s2i, once merged)... that is, it should be doable without ugly "workarounds". The run-postgresql does too much, so maybe separate command would be needed for this. I'll have a look at this later, probably hack some "example" project leveraging this ..

omron93 · 2018-01-10T11:47:06Z

Of course, the db scenario might be that you fetch the data from the internet after db initialization, but that's not anymore task for s2i.

Task for s2i could to process sql with the right postgresql version and the database "to the internet" (volume,...)

praiskup · 2018-01-10T12:18:43Z

@omron93 , can you elaborate on the use case more concretely? I'm not sure I follow.

omron93 · 2018-01-10T13:28:13Z

I was thinking mainly about storing data in image. I don't know details but I think image is transferred over network several times during app lifetime (pushed into registry, pulled into each node where image is run,...). So the bigger the image is the slower that process is.

If you have plain text sql file with default data baked into the image, the space requirements are asymptotically equivalent.

Of course, the db scenario might be that you fetch the data from the internet after db initialization, but that's not anymore task for s2i.

I can image this scenario (nothing detailed, only the way how I understand your goal):
I guess that (in general) to use database files in binary form the files have to be created with same version that will use it.
So s2i build is created every time:

new version of database image is created
sql form of database stored in some git repo is changed

Every build do:

configure database, create users,... (customization common for our database images now)
import initial sql data -> copy the binary form of database to some shared location -> and "clean database" (only imported data, no configuration)
s2i build commits this state of container as new image

And the image during the start could allow an option to obtain database files from somewhere (for example copy /var/lib/postgresql/initdata to /var/lib/postgresql/data). Initial data would be mounted there from shared location by kubernetes for example.

The benefit of s2i usage is that it will automatically create right binary initial database when image or sql data change!

(what is wrong in this is that I think OpenShift don't support using persistent volumes during build... and reuse them in deployments)

On the other hand /var/lib/postgresql/initdata can be stored in the image after s2i build. Currently I don't have any preference which way to use. The above is only another possible implementation.

pkubatrh · 2018-01-10T13:55:45Z

@omron93 so basically instead of the result of an s2i build being just the image, it would be an image and an initial DB living somewhere and would get re-initialized every time the the image or input sql changes?

That seems like a unnecesarily difficult way to achieve an always initialized database. Would rather go with Pavel's original proposal of baking the data directly into the image since that would work everywhere without too much hassle.

Generally we could provide the users with some hooks that would be called during the assemble process if present and leave them the freedom to do whatever they need to do.

praiskup · 2018-02-08T10:41:51Z

Seems like the idea is pretty complicated; the assemble script is now too trivial
and we don't even start the postgresql server (run-postgresql script) when
assembling.

So to make this happen, we would have to have (a) way to run initialize_database
through assemble, e.g. through some assemble-done hook, and (b) have some postgresql-preinit hook. The workflow would be that:

post-assemble hook: initdb -> import data (from some hook) -> tar cJf /baked-data.tar.xz /var/lib/pgsql/data/userdata
the pre-init hook (called right after generate_postgresql_config) check that userdata is not initialized, and if the tarball exists it would extract it.

So can we vote whether this makes sense? (likes/dislikes, I could then follow up with PR adding the hook support, and preparing example leveraging this feature)

- run 'initdb' from 'assemble', and bake the datadir into image - install hook which extracts the tarball when data is not initialized Fixes: sclorg#220

praiskup · 2018-03-11T10:59:29Z

See #251 with WIP example.

postgresql-pre-start hook example and test - run 'initdb' from 'assemble', and bake the datadir into image - install hook which extracts the tarball when data is not initialized Fixes: sclorg#220

pkubatrh added the question label Jan 5, 2018

praiskup added a commit to praiskup/postgresql-container that referenced this issue Mar 11, 2018

postgresql-boot hook example

bfbc354

- run 'initdb' from 'assemble', and bake the datadir into image - install hook which extracts the tarball when data is not initialized Fixes: sclorg#220

pkubatrh closed this as completed in b737ad2 Mar 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster s2i container start idea #220

faster s2i container start idea #220

praiskup commented Jan 4, 2018

omron93 commented Jan 4, 2018

praiskup commented Jan 4, 2018 •

edited

Loading

pkubatrh commented Jan 5, 2018

praiskup commented Jan 5, 2018

omron93 commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

praiskup commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

praiskup commented Jan 9, 2018

omron93 commented Jan 10, 2018

praiskup commented Jan 10, 2018 •

edited

Loading

omron93 commented Jan 10, 2018 •

edited

Loading

pkubatrh commented Jan 10, 2018

praiskup commented Feb 8, 2018

praiskup commented Mar 11, 2018

faster s2i container start idea #220

faster s2i container start idea #220

Comments

praiskup commented Jan 4, 2018

omron93 commented Jan 4, 2018

praiskup commented Jan 4, 2018 • edited Loading

pkubatrh commented Jan 5, 2018

praiskup commented Jan 5, 2018

omron93 commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

praiskup commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

pkubatrh commented Jan 9, 2018

praiskup commented Jan 9, 2018

omron93 commented Jan 10, 2018

praiskup commented Jan 10, 2018 • edited Loading

omron93 commented Jan 10, 2018 • edited Loading

pkubatrh commented Jan 10, 2018

praiskup commented Feb 8, 2018

praiskup commented Mar 11, 2018

praiskup commented Jan 4, 2018 •

edited

Loading

praiskup commented Jan 10, 2018 •

edited

Loading

omron93 commented Jan 10, 2018 •

edited

Loading