Skip to content

Keep getting "Failed to send request to server. Did the server crash?" when queue contains close to 120 generations, even though no backend is crashed #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Lissanro opened this issue Oct 30, 2024 · 9 comments
Labels
User Support A user needs help with something, probably not a bug.

Comments

@Lissanro
Copy link

Expected Behavior

I expect to add as much images as I want to my queue, usually in batches of 4, each with unique prompt.

Actual Behavior

I only can add less than 120 promts to the queue. I tried to increase OverQueue from 1 to 256 on each backend (I have four in total, using 3090 card each), but I am still limited to 120 prompts in the queue.

Steps to Reproduce

Keep adding images to the queue by clicking generate until get the "Failed to send request to server. Did the server crash?" error, even though all backends continue to run without errors. Once there will be fewer images in the queue (below 100), it will be possible to add more images again, but this obviously does not allow me to add enough of them for overnight processing.

Debug Logs

There are no errors in the console at all, just lines like these if images were added successfully:

17:47:46.916 [Info] User local requested 4 images with model 'flux1-dev-Q8_0.gguf'...
17:47:54.103 [Info] User local requested 4 images with model 'flux1-dev-Q8_0.gguf'...
17:48:02.785 [Info] User local requested 4 images with model 'flux1-dev-Q8_0.gguf'...

And no output if it refused to add more when close to the 120 images limit.

Other

No response

@Lissanro Lissanro added the Bug Something isn't working label Oct 30, 2024
@mcmonkey4eva mcmonkey4eva added User Support A user needs help with something, probably not a bug. and removed Bug Something isn't working labels Oct 30, 2024
@mcmonkey4eva
Copy link
Member

First: definitely don't edit OverQueue like that, that's not what that does.

Regarding the error itself: ... that's... weird. That shouldn't have.

Please click Server->Logs->Pastebin and post the link it gives you

@Lissanro
Copy link
Author

Lissanro commented Oct 30, 2024

Yes, I set OverQueue back to 1. I just thought if I set it very high, it may allow me to go beyond 120 limit in the queue, but it did not.

Here is the full log: https://pastebin.com/LU66FvDw

Here I kept adding groups of 4 images in the queue, until hit the error, and trying to add few more times when I did, but it does not allow me to go beyond the 120 limit. No errors displayed in the log when it shows "Failed to send request to server. Did the server crash?" in the UI. All backend continue working normally and allow me to add more images later (when the queue is smaller) as long as I do not brake the 120 limit.

@mcmonkey4eva
Copy link
Member

Again, Please click Server->Logs->Pastebin and post the link it gives you

@Lissanro
Copy link
Author

Lissanro commented Oct 30, 2024

https://paste.denizenscript.com/View/127635

Again, I added images until I hit the error, still no errors in the log as far as I can tell, even at the Debug level. The last successful adding to the queue was here:

2024-10-30 19:09:59.871 [Info] User local requested 4 images with model 'pixelwave_flux1_dev_Q8_0_03.gguf'...

@mcmonkey4eva
Copy link
Member

hahahahaha okay I was able to replicate the issue, but not in the same way you did.

I encountered a limit at exactly 200, which matches: Chrome allows up to 256 concurrent connections per client by default, while Firefox can handle 200 connections per client by default. What are WebSockets?
And more importantly, it matches the fact that you don't even get any debug output - this is a browser failure, not a server failure.

But the problem is... you're limited at 120, and you're doing 4 images at a time it you only have to open 30 websockets for it to start refusing. So either you have 170 websockets open somewhere else (???) or you're not seeing the same core issue.

I can probably dodge the websocket count limit by switching the code to reuse a websocket rather than opening a new one every time. Not sure if that will resolve your issue though

@Lissanro
Copy link
Author

Lissanro commented Oct 30, 2024

I am sure I have open websockets by something else, besides, I noticed it is not always 120 but sometimes a smaller number. Maybe because other tabs use them too. If you can fix this by reusing websockets, it would be really great! This limit really restrictive because four GPUs go through it very quickly, so at the moment I cannot queue much. Thank you very much for looking into the issue.

@mcmonkey4eva
Copy link
Member

Update to the new commit and give that a try

@mcmonkey4eva
Copy link
Member

ps, if you have 4 fast GPUs, I'd be surprised if you get it running overnight through spamclicking 'Generate'. You can however get a full night's run by using the Grid Generator tool, I've done that a lot in the past. It also intentionally uses just one connection, and disables previews, to avoid browser issues for the large bulk generation run

@Lissanro
Copy link
Author

Thank you, after git pull and refreshing the SwarmUI page, it worked! Thanks again for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User Support A user needs help with something, probably not a bug.
Projects
None yet
Development

No branches or pull requests

2 participants