Supporting multi-process (forked) applications #9

rperryng · 2022-11-07T19:38:55Z

Hey there.

We have noticed that in our staging/production environments that usage operation is not propagating to our GraphQL Hive instance (but schema publishing works)

We're using the graphql-ruby-hive plugin with a rails server. Locally we run the server directly with bundle exec rails server but in our staging/production environments we use puma in its clustered (multi-process) mode.

We also make use of puma's preload_app! directive. Therefore, in staging/production our puma preloads the Rails application (including initializating the GraphQL Hive plugin), and then forks this process before serving requests using one of the worker process's thread pools.

From my understanding, POSIX fork() does not copy threads, so once puma forks the process, a copy of the empty, just-intiialized Queue exists in the new worker processes but the thread that was spawned to consume the messages is not copied post-fork.

So with all that said, what I believe is happening is:

Rails app boots up
GraphQL::Hive::UsageReporter gets initialized, spins up thread to continuously read from the Queue structure
Puma forks the process (Queue is copied but the thread reading from it is not)
Messages get added to the Queue but never get read out of them
a. I confirmed this behaviour with a forked version of the gem with extra debugging logs

I have also confirmed that this behaviour goes away when running our web server in a non-clustered mode.

I'm unsure of the best way to support this use-case.

Puma does offer the on_worker_boot callback (i.e. a post fork hook). Theoretically, if GraphQL::Hive offered a way to control when it spawns the Queue thread, then we could use this hook to ensure the thread is created in the fork-ed processes.

What do you think?

Happy to implement a solution if you've got any guidance/opinions etc.

The text was updated successfully, but these errors were encountered:

rperryng · 2022-11-07T23:20:07Z

For inspiration, I found this relevant thread about registering hooks for when the host process is forked.

A more robust solution proposed in the thread would be to simply check Thread#alive? (probably when operations are added to the buffer?) and restart it when it is dead.

I tried this solution (#10) and confirmed it fixed the problem for us.

rperryng · 2022-11-14T15:48:46Z

👋 @charlypoly is there anything else I can do to help with this?

charlypoly · 2022-11-15T09:39:41Z

👋 @charlypoly is there anything else I can do to help with this?

Hi @rperryng,

After a quick review, it seems that your proposed implementation might lead to some deadlock situations.
I plan to work on this issue soon by using puma's hooks.

charlypoly · 2022-11-29T11:22:06Z

Hi @rperryng,

sorry for the delay; I was working on other matters 👀

I've conducted tests with Puma in clustered mode locally using the k6 setup present in the project and reproduced the issue (some operations were missing).
After narrowing down the issue, I realized that the issue was linked to our library hooking on Kernel#at_exit for clearing the buffer of operations which is not triggered in forked processes.
I came up with the following solution, which avoids losing operations, adding the following hook in your Puma config to do the job:

require 'graphql'
require 'graphql-hive'

# tell GraphQL Hive client to send operations in buffer before shutting down in clustered mode
on_worker_shutdown { |_key| GraphQL::Hive.instance&.on_exit }

Could you try this and let me know?
Thank you!

rperryng · 2022-12-01T16:35:02Z

@charlypoly I tried your suggestion but unfortunately still wasn't able to get data to push through.

I think the problem you describe is different - operations in the buffer not being flushed when the worker is shut down. The problem I am having is that when the worker boots up (when puma fork()s its main process) the thread that monitors the operations buffer is not copied over, and so buffer builds up in infinitely in my case (all operations are missing!)

for example, I added a log to print the queue size whenever operations are added into it, and you can see it growing past the queue limit here:

these logs are over a span of about 5 minutes

My understanding is that because we are using the preload_app! directive, we end up preloading GraphQL::Hive and its usage_reporter instance and starting up the operations buffer thread before the fork() happens so the thread that monitors/flushes the operations buffer never exists in the worker processes.

Did you use preload_app! in your test? If not, then I believe GraphQL::Hive is only loaded after the fork() process happens (most likely when a worker receives its first web request), and so the thread which monitors the operations buffer is also only being started after the fork(), and therefore avoiding being killed (well until the worker shuts down as you noticed 🙂)

rperryng · 2022-12-12T16:13:22Z

Also FWIW, I tried out adding a on_start hook to GraphQL::Hive and calling this with puma's on_worker_boot hook and confirmed that this also fixes the issue.

rperryng · 2023-01-03T19:20:20Z

@charlypoly happy new year! Any thoughts on the above?

rperryng changed the title ~~Supporting multi-process (forked) web servers~~ Supporting multi-process (forked) applications Nov 7, 2022

rperryng mentioned this issue Nov 8, 2022

Restart usage report buffer consumption thread when it dies #10

Closed

charlypoly self-assigned this Nov 8, 2022

rperryng mentioned this issue Jan 10, 2023

Add "on_start" hook to revive thread after a fork #11

Merged

charlypoly closed this as completed in #11 Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting multi-process (forked) applications #9

Supporting multi-process (forked) applications #9

rperryng commented Nov 7, 2022 •

edited

Loading

rperryng commented Nov 7, 2022 •

edited

Loading

Uh oh!

rperryng commented Nov 14, 2022

Uh oh!

charlypoly commented Nov 15, 2022

Uh oh!

charlypoly commented Nov 29, 2022

Uh oh!

rperryng commented Dec 1, 2022 •

edited

Loading

Uh oh!

rperryng commented Dec 12, 2022

Uh oh!

rperryng commented Jan 3, 2023

Uh oh!

Supporting multi-process (forked) applications #9

Supporting multi-process (forked) applications #9

Comments

rperryng commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rperryng commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rperryng commented Nov 14, 2022

Uh oh!

charlypoly commented Nov 15, 2022

Uh oh!

charlypoly commented Nov 29, 2022

Uh oh!

rperryng commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rperryng commented Dec 12, 2022

Uh oh!

rperryng commented Jan 3, 2023

Uh oh!

rperryng commented Nov 7, 2022 •

edited

Loading

rperryng commented Nov 7, 2022 •

edited

Loading

rperryng commented Dec 1, 2022 •

edited

Loading