-
Notifications
You must be signed in to change notification settings - Fork 43
Issue with microservice using r2dbc oracle inside kubernetes cluster. (Request timeouts) #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @manuelgdlvh. Thanks for reporting this. If no database session is created, then it sounds like the driver is stuck at some step when opening a connection. Would it be possible to share a thread dump, such as one from a jstack command? This might tell us where the driver has become stuck. |
Hi @Michael-A-McMahon this is the thread dump: threaddump.txt |
Thanks for the thread dump. I can't pin point a root cause just yet, but we do have some information. I have some questions which I think will help us find a resolution: Have you tried opening a plain JDBC connection in this environment? I wonder if Oracle R2DBC is sending a login request to the database, and simply not receiving a response? If this is the case, then we should be able to see the same behavior with Oracle JDBC. Are there any errors appearing in logs or stderr? It may be the case that the connection is failing, and the Exception is getting dropped somewhere along the way. Is it possible that -Djava.util.concurrent.ForkJoinPool.common.parallelism=0? If this system property is set, then Oracle R2DBC/JDBC will have no threads to execute asynchronous callbacks. The driver will submit a task to the common ForkJoinPool, which will then silently fail to execute it. Below are just my notes about the thread dump:
|
Hi @Michael-A-McMahon , the most strange thing is that with jdbc inside k8s cluster works well and with the service dockerized and running outside cluster works with jdbc and r2dbc. |
If Oracle JDBC can connect synchronously, then something specific to its async/reactive code must be causing the failure. Could be something about using java.nio.channels.Selector, or something about using the common ForkJoinPool.
The first possibility seems more likely at this point, as the second would require a non-default system property setting which I think you've already checked for. We could completely rule out the second possibility by testing with: As an experiment, we could also try updating the ojdbc11 dependency: <groupId>com.oracle.database.jdbc</groupId>
<artifactId>ojdbc11</artifactId>
<version>23.2.0.0</version> Oracle R2DBC currently depends on the 21.7.0.0 build. But code for reactive extensions has been significantly overhauled in the 23 version. It's possible we would get a different result with 23. *Edited the artifactId |
Hi @Michael-A-McMahon , I did the lasts tests using the jvm flag that you said me and now its working properly. I will investigate the interaction between kubernetes - jvm because using the same image in docker works. All information found will be shared with you and community. Thanks for all! |
That is interesting! I was convinced that FJP should have at least 1 thread, unless explicitly configured otherwise. But it sounds like you had to explicitly configure parallelism to be 1 or greater. Looking forward to learning more. |
Perhaps that is a bug related with the containerized environments and the JVM. |
Fascinating. I think Oracle R2DBC could provide a safe guard for this situation, something rooted in this pseudo-ish code: static final Executor DEFAULT_EXECUTOR;
static {
int commonFjpParallelism = ForkJoinPool.commonPool().parallelism();
if (commonFjpParallelism < 1)
DEFAULT_EXECUTOR = Executors.newSingleThreadExecutor();
else
DEFAULT_EXECUTOR = ForkJoinPool.commonPool();
} Let's leave this issue open so I can look further into a fix. This is a nasty bug, and I don't want other users to hit it. |
Hi again @manuelgdlvh. Could you share the JDK version where this issue occurred? |
The JDK version where the issue occurs for me is the one provided by the openjdk17-alpine Docker image. In addition to the solution of adding the JVM flag for parallelism, I also tested setting the minimum CPU and RAM resources in the Kubernetes deployment, and it worked well too. In my test, I set the minimum CPU to 1 and RAM to 1GB. I believe that when using only Docker, the JVM can effectively read the resources of the host machine. However, in Kubernetes, if we don't provide minimum resource settings, it may not be able to correctly determine the available machine resources. Anything I can help you with, feel free to contact me. |
Hi @manuelgdlvh. Very sorry I left you hanging on this. I got wrapped up in preparations for CloudWorld 2023. Your JDK version matches that of the afflicted version in https://bugs.openjdk.org/browse/JDK-8274349 If you update to the latest JDK 17, I think you'll find the issue resolves itself without having to set "java.util.concurrent.ForkJoinPool.common.parallelism", and without having to configure resources of Kubernetes. I could be wrong about this, so it would be good if you have a chance to test and verify. At this point, I think the only case Oracle R2DBC needs to check for is "java.util.concurrent.ForkJoinPool.common.parallelism=0". That should be the only way we get an FJP with zero threads. |
I've just spent days troubleshooting an AWS Fargate deployment where the request to get a connection would hang. This was because the ForkJoinPool.commonPool (default) had no free threads. To work around this I supplied the connection factory with a custom executor:
I raised this issue in #136 |
When this app is deployed in a local Kubernetes cluster with a basic configuration, when in our service execute a query pointing to a Oracle database using r2dbc oracle driver it's stuck with no response on a infinite wait without errors (session is not created on DB)
If we test exactly same but running in local or container (without Kubernetes), it works.
This is all tests done:
The text was updated successfully, but these errors were encountered: