-
Notifications
You must be signed in to change notification settings - Fork 360
Subscription stops working doing automated tests #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gklijs Thank you for running tests against our library! It is helpful to see performance metrics outside of @ExpediaGroup Do you think you could run the same perf test against this application? https://github.com/graphql-java-kickstart/graphql-spring-boot/tree/master/example-graphql-subscription This is the graphql-java subscriptions example also using spring boot. If this is showing similar numbers than this issue really should be opened on graphql-java instead as we are not implementing much logic on top of the subscription execution. |
The Java test is already based on that, it's in https://github.com/openweb-nl/kafka-graphql-examples/tree/ge-java. It's performing much better than this one, so the problems seems not entirely with graphql-java. |
Ah ok, I apologise. I did not see it was a separate branch. This definitely is worth further investigation then. I don't have time to investigate this week but this is definitely high priority. Of course any community members are open to help! |
Tried some things, using Flow instead of WebFlux to pass as publisher. But it keeps the same. Shat after the first generator is created it stopped. |
I think this line, |
Getting closer to a solution, I'm sure that problem for failing the test is that the messages from the websocket don't arrive anymore, without the client reconnecting. |
I'm able to reproduce it by starting and ending a subscription 128 times, after that the server doesn't send the data back, but dus keep sending the ak (with the pr version). |
There seem to be some kind of congestions of multiple Flux that aren't properly released. I added some debug statements and called the subscription till it broke. It ended with:
Somehow the handle method is not called anymore. Only after refreshing the page the situation seems to restore itself, and after some seconds, the cancelation of all the all subscriptions pass by in the logging. |
That client is only used for testing, and they stay open till the end of the test. It might still be a good idea to add them through, and most likely the reason of some error logging at the and of the test. Line 77 in 111ccaf
|
Thanks for merging, although it was only a small improvement. Removing the concat Line 121 in 111ccaf
|
Just a side note -> I just noticed that you are using Kotlin |
Good to know, I changed that part several times now. I could do it similar to the java version, using webflux. But even then it had the same problem. I also changed it to be a bit less experimental, using broadcast. |
Running the perf tests against the example app in graphql-kotlin and this example app in graphql-java GraphQL JavaTest Configconfig:
target: 'ws://localhost:3000/stockticker'
ws:
rejectUnauthorized: false
phases:
# Duration in seconds
- duration: 600
arrivalRate: 20
scenarios:
- name: "Run GraphQL Subscription"
engine: "ws"
flow:
- send:
query: |-
subscription TestSubscription {
stockQuotes {
dateTime
stockCode
stockPrice
stockPriceChange
}
}
- think: 2 Results
GraphQL KotlinTest Configconfig:
target: 'ws://localhost:8080/subscriptions'
ws:
rejectUnauthorized: false
subprotocols:
- graphql-ws
phases:
- duration: 600
arrivalRate: 20
scenarios:
- name: "Run GraphQL Subscription"
engine: "ws"
flow:
- send:
type: "connection_init"
- send:
type: "start"
id: "1"
payload:
query: |-
subscription TestSubscription {
counter
}
- think: 2
- send:
type: "connection_terminate" Results
|
So graphql-kotlin is using a lot more memory. It is freeing up on GC but that is orders larger. graphql-java was jumping back and forth from 10MB to ~40MB. graphql-kotlin was jumping from 30MB to 200MB. The difference in request count is that the kotlin implementation uses the |
We are using SpringBoot which is much heavier than the basic Jetty server started from |
Might be related, especially since the graphql-java was already having to less memory available in the setup, and using the sand for graphql-kotlin. Although I tried at least once with just running the Kotlin part locally, with more resources. |
Good news, it seems fixed. I don't know if it's a small changes I did on the current master, or it's just the latest master + enough memory. Hope to get that clear tonight, so I can do a whole performance run overnight. I think there is also nothing there now to prevent the same client subscribing with the same id twice, without stopping between. That's on of the things I changed locally. Will create a pr if needed. |
Tested with #520 overnight and of the 5 runs, 3 ran for over on hour. So once the pr is merged, this can be closed. |
Fixed by #520 (and maybe partly by some of the fixes before it) |
Library Version
1.4.2
Describe the bug
When running a load test, I very early on get time-outs. I have a comparable implementation using Java, and it runs a whole lot longer. It's about 2 minutes vs. 30 minutes. I don't see any error in the logs. What will happen is that every second a couple of times a subscription will be started, a message will be received, through Kafka and Webflux, and then the subscription will be stopped again.
A preliminary comparison can be found here.
To Reproduce
At kafka-graphql-examples the implementation can be found. It's possible on mac/linux to run the whole test using the scripts.
I know this is quite a lot, and the test is doing a lot more than needed. I might try to create a test in Kotlin, using the example project that produces a similar result.
Expected behavior
To be at least on par with the java version, or be faster because of the use of coroutines.
I saw there was already an open issue for changing the subscription implementation, #358. It might fix the problem.
The text was updated successfully, but these errors were encountered: