Skip to content

Supporting multi-process (forked) applications #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rperryng opened this issue Nov 7, 2022 · 7 comments · Fixed by #11
Closed

Supporting multi-process (forked) applications #9

rperryng opened this issue Nov 7, 2022 · 7 comments · Fixed by #11
Assignees

Comments

@rperryng
Copy link
Collaborator

rperryng commented Nov 7, 2022

Hey there.

We have noticed that in our staging/production environments that usage operation is not propagating to our GraphQL Hive instance (but schema publishing works)

We're using the graphql-ruby-hive plugin with a rails server. Locally we run the server directly with bundle exec rails server but in our staging/production environments we use puma in its clustered (multi-process) mode.

We also make use of puma's preload_app! directive. Therefore, in staging/production our puma preloads the Rails application (including initializating the GraphQL Hive plugin), and then forks this process before serving requests using one of the worker process's thread pools.

From my understanding, POSIX fork() does not copy threads, so once puma forks the process, a copy of the empty, just-intiialized Queue exists in the new worker processes but the thread that was spawned to consume the messages is not copied post-fork.

So with all that said, what I believe is happening is:

  1. Rails app boots up
  2. GraphQL::Hive::UsageReporter gets initialized, spins up thread to continuously read from the Queue structure
  3. Puma forks the process (Queue is copied but the thread reading from it is not)
  4. Messages get added to the Queue but never get read out of them
    a. I confirmed this behaviour with a forked version of the gem with extra debugging logs

I have also confirmed that this behaviour goes away when running our web server in a non-clustered mode.

I'm unsure of the best way to support this use-case.

Puma does offer the on_worker_boot callback (i.e. a post fork hook). Theoretically, if GraphQL::Hive offered a way to control when it spawns the Queue thread, then we could use this hook to ensure the thread is created in the fork-ed processes.

What do you think?

Happy to implement a solution if you've got any guidance/opinions etc.

@rperryng rperryng changed the title Supporting multi-process (forked) web servers Supporting multi-process (forked) applications Nov 7, 2022
@rperryng
Copy link
Collaborator Author

rperryng commented Nov 7, 2022

For inspiration, I found this relevant thread about registering hooks for when the host process is forked.

A more robust solution proposed in the thread would be to simply check Thread#alive? (probably when operations are added to the buffer?) and restart it when it is dead.

I tried this solution (#10) and confirmed it fixed the problem for us.

@rperryng
Copy link
Collaborator Author

👋 @charlypoly is there anything else I can do to help with this?

@charlypoly
Copy link
Contributor

👋 @charlypoly is there anything else I can do to help with this?

Hi @rperryng,

After a quick review, it seems that your proposed implementation might lead to some deadlock situations.
I plan to work on this issue soon by using puma's hooks.

@charlypoly
Copy link
Contributor

Hi @rperryng,

sorry for the delay; I was working on other matters 👀

I've conducted tests with Puma in clustered mode locally using the k6 setup present in the project and reproduced the issue (some operations were missing).
After narrowing down the issue, I realized that the issue was linked to our library hooking on Kernel#at_exit for clearing the buffer of operations which is not triggered in forked processes.
I came up with the following solution, which avoids losing operations, adding the following hook in your Puma config to do the job:

require 'graphql'
require 'graphql-hive'

# tell GraphQL Hive client to send operations in buffer before shutting down in clustered mode
on_worker_shutdown { |_key| GraphQL::Hive.instance&.on_exit }

Could you try this and let me know?
Thank you!

@rperryng
Copy link
Collaborator Author

rperryng commented Dec 1, 2022

@charlypoly I tried your suggestion but unfortunately still wasn't able to get data to push through.

I think the problem you describe is different - operations in the buffer not being flushed when the worker is shut down. The problem I am having is that when the worker boots up (when puma fork()s its main process) the thread that monitors the operations buffer is not copied over, and so buffer builds up in infinitely in my case (all operations are missing!)

for example, I added a log to print the queue size whenever operations are added into it, and you can see it growing past the queue limit here:
image

these logs are over a span of about 5 minutes

My understanding is that because we are using the preload_app! directive, we end up preloading GraphQL::Hive and its usage_reporter instance and starting up the operations buffer thread before the fork() happens so the thread that monitors/flushes the operations buffer never exists in the worker processes.

Did you use preload_app! in your test? If not, then I believe GraphQL::Hive is only loaded after the fork() process happens (most likely when a worker receives its first web request), and so the thread which monitors the operations buffer is also only being started after the fork(), and therefore avoiding being killed (well until the worker shuts down as you noticed 🙂)

@rperryng
Copy link
Collaborator Author

Also FWIW, I tried out adding a on_start hook to GraphQL::Hive and calling this with puma's on_worker_boot hook and confirmed that this also fixes the issue.

@rperryng
Copy link
Collaborator Author

rperryng commented Jan 3, 2023

@charlypoly happy new year! Any thoughts on the above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants