Skip to content

Allow pthreads on Node.js without a pthread pool #18305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 13, 2022

Conversation

RReverser
Copy link
Collaborator

Node.js worker_threads are different than browser Workers in that they start spawning synchronously without waiting for an event loop tick.

If we can also avoid waiting for the "loaded" event from the worker before sending pthread tasks to it, then we can spawn new pthreads "synchronously" (prepare everything within the same event loop tick), and block the current thread, making the behaviour on Node.js a lot closer to native and avoid the need for a pthread pool even without Asyncify or extra helper workers.

That's what I did in this implementation. Instead of waiting for the worker to tell us it's loaded and ready, I'm sending all commands immediately to the worker. The worker accepts the first "load" message, starts initializing the runtime, and meanwhile queues up any further messages such as "run". Once the runtime is ready, it processes the queue.

I could limit those changes only to Node.js, but it's easier to do both together, as it allows to avoid a custom worker.runPthread callback, and should be in theory even a bit faster in browsers too (by avoiding the "loaded" roundtrip before running a pthread).

@RReverser RReverser requested review from kripken and sbc100 December 2, 2022 14:28
@RReverser
Copy link
Collaborator Author

RReverser commented Dec 2, 2022

Note: this depends on #18267, because it changes the time of when the "run" command is sent to a Worker. That one's merged now.

RReverser added a commit that referenced this pull request Dec 2, 2022
Extracted some changes out of #18305 that report Worker as ready only once runtime is initialized *and* `Module` variable is assigned.

Want to see if it incidentally also helps with #18307.
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe someone else can review this more easily, but for me at least I don't recall all the async startup logic, so I could benefit from an overview of the old way and the new way proposed here.

@RReverser
Copy link
Collaborator Author

RReverser commented Dec 2, 2022

Maybe someone else can review this more easily, but for me at least I don't recall all the async startup logic, so I could benefit from an overview of the old way and the new way proposed here.

Let me try to do a TL;DR summary by using a pthread_create as an example and assuming we don't have a pthread pool or it's exhausted.

Before:

  1. Main thread creates a Worker via new Worker and subscribes to its messages.
  2. Main thread sends a message "load" with the Wasm module and some other bits to the Worker.
  3. Worker receives the "load" message with the Wasm module, loads relevant JS files, and asynchronously initializes the runtime.
  4. Worker initialization is done, it sends a message "loaded" to the main thread.
  5. Main thread receives message "loaded".
  6. Main thread sends a message "run" to the Worker with a pointer to pthread callback and other relevant info.
  7. Worker receives the message "run" and executes the pthread.

After:

  1. Main thread creates a Worker via new Worker and subscribes to its messages.
  2. Main thread sends a message "load" with the Wasm module and some other bits to the Worker.
  3. Main thread sends a message "run" to the Worker with a pointer to pthread callback and other relevant info.
  4. Worker receives the "load" message with the Wasm module, loads relevant JS files, and asynchronously initializes the runtime.
  5. Worker stores all the other incoming messages into a queue (in this example it's just a message "run").
  6. Worker initialization is done, it sends a message "loaded" to the main thread.
  7. Worker executes all the queued up messages (in this example just a message "run", so it executes the pthread).

What this reordering means on practice in Node.js is that before this change the main thread couldn't block synchronously right after pthread_create because it then didn't ever have a chance to run steps 5-7, because, with the event loop blocked, it wouldn't even receive the message "loaded" from the Worker.

After this change, the main thread doesn't wait for that "loaded" message anymore and instead sends the "run" command immediately in the same event loop tick, putting the responsibility onto the Worker to only execute the code after it's fully loaded. This means main thread can safely block right after the pthread_create call, even though the Worker is not ready yet.

Hope this helps.

@RReverser RReverser force-pushed the node-without-pthread-pool branch 2 times, most recently from b92c75f to d58e058 Compare December 3, 2022 04:05
@RReverser RReverser requested review from sbc100 and kripken December 5, 2022 18:38
@kripken
Copy link
Member

kripken commented Dec 7, 2022

@RReverser In the "after" list, steps 2 and 3 are both "Main thread sends a message" - could it not send a single message for both?

@RReverser
Copy link
Collaborator Author

RReverser commented Dec 7, 2022

@RReverser In the "after" list, steps 2 and 3 are both "Main thread sends a message" - could it not send a single message for both?

They're not always sent together. In this particular example - pthread_create when a threadpool is not ready - yes, it will send 2 messages. However, in other cases, e.g. when preparing a threadpool, there will be only "load" message, and when doing pthread_create with a ready pool, there will be only "run".

It's possible to add 3rd kind of message for combined "loadAndRun", but it would complicate code further and won't add any perf benefits because those messages, when they happen to be sent both at once, are already sent in the same event loop tick.

@RReverser RReverser force-pushed the node-without-pthread-pool branch 2 times, most recently from aeadff5 to 1ad37f5 Compare December 11, 2022 18:51
@RReverser
Copy link
Collaborator Author

Rebased & added a changelog line. Are there any blocking changes here left?

Node.js `worker_threads` are different than browser `Worker`s in that
they start spawning synchronously without waiting for an event loop
tick.

If we can also avoid waiting for the "loaded" event from the worker
before sending pthread messages to it, then we can spawn new pthreads
"synchronously" (prepare everything within the same event loop tick),
and block the current thread, making the behaviour on Node.js a lot
closer to native and avoiding the need for a pthread pool even without
Asyncify or extra helper workers.

That's what I did in this implementation. Instead of waiting for the
worker to tell us it's loaded and ready, I'm sending all commands
immediately to the worker. The worker accepts the first "load" message,
starts initializing the runtime, and meanwhile queues up any further
messages such as "run". Once the runtime is ready, it processes the
queue.

I could limit those changes only to Node.js, but it's easier to do both
together, allows to avoid a custom `worker.runPthread` callback, and
should be in theory even a bit faster in browsers too (by avoiding the
"loaded" roundtrip).
@RReverser RReverser force-pushed the node-without-pthread-pool branch from 1ad37f5 to bee58ad Compare December 12, 2022 15:00
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IIRC that pool pre-allocation still happens under node, even though its not really necessary?

I do like being able to test stuff under node, so I guess it is still useful, but should we consider having the re-allocated pool be something that only ever happens in the browser?

@RReverser
Copy link
Collaborator Author

So IIRC that pool pre-allocation still happens under node, even though its not really necessary?

If user specifies PTHREAD_POOL_SIZE=N, then yes. But if not specified or if the pool is exceeded, then it won't throw an error anymore, but will spawn synchronously anyway.

but should we consider having the re-allocated pool be something that only ever happens in the browser?

I think it makes sense to still respect user choice here; if they explicitly specify PTHREAD_POOL_SIZE, they might want the faster startup for pthreads.

Once we add some way to avoid pthread pool in browsers too, we can document that PTHREAD_POOL_SIZE is no longer necessary if your only goal is to avoid deadlocks, and gradually migrate users away from it.

I don't want to completely ignore the option just for one target as part of this PR though.

@kripken
Copy link
Member

kripken commented Dec 12, 2022

They're not always sent together. In this particular example - pthread_create when a threadpool is not ready - yes, it will send 2 messages. However, in other cases, e.g. when preparing a threadpool, there will be only "load" message, and when doing pthread_create with a ready pool, there will be only "run".

I see, thanks, that's what I was missing.

@RReverser
Copy link
Collaborator Author

I see, thanks, that's what I was missing.

Glad it clears it up, sorry for the confusion. I used that particular scenario to show differences in workflows because it's the scenario that would deadlock before this change but won't now; other mentioned scenarios are less interesting because they worked anyway.

@RReverser RReverser enabled auto-merge (squash) December 12, 2022 23:09
@RReverser RReverser merged commit 73eaf81 into main Dec 13, 2022
@RReverser RReverser deleted the node-without-pthread-pool branch December 13, 2022 01:43
kleisauke added a commit to kleisauke/emscripten that referenced this pull request Apr 14, 2023
The corresponding TODO-item is no longer relevant after PR emscripten-core#18305
has landed.

Resolves: emscripten-core#9763.
sbc100 pushed a commit that referenced this pull request Apr 14, 2023
The corresponding TODO-item is no longer relevant after PR #18305
has landed.

Resolves: #9763.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants