Reconnect Strategy doesn't work if connection is lost #2948

edrose · 2025-05-07T10:09:08Z

Description

I'm having some issues with the reconnect strategy in this library. It seems to work as per the documentation on the initial connection to Redis, however the behaviour changes if the connection to Redis is lost at a later stage.

The documentation states that there are three different types that can be returned:

A number can be returned to indicate that the client should retry after the specified delay
An error can be returned to terminate the connection
false can be returned to prevent reconnection and hold the client in some kind of broken state for testing

Behaviour on Initial Connection

The RedisSocket.connect() function is called
This calls the private function RedisSocket.#connect() which deals with retries
There is a line in packages/client/lib/client/socket.ts:218 where the output of the reconnect strategy function is evaluated - if it's a number then a retry is scheduled; if it's not a number then it throws whatever it is.
Assuming the value is not false (which I assume is a noop in javascript) it bubbles up to the initial call to RedisSocket.connect() and the promise called by the library user rejects with the error

This is the behaviour that is documented.

Behaviour when connection is lost

The socket "error" event fires, which calls into RedisSocket.#onSocketError()
RedisSocket.#onSocketError() re-emits the error event and also emits "reconnecting" before calling this.#connect() on line 269
Crucially, that line functionally looks like this: this.#connect().catch(() => {}));

Any error in connecting is completely ignored. This means that is has the same effect as if the reconnect strategy function returned false which (quoting the documentation here) "should only be used for testing purposes". This silently leaves the client in a broken state.

What I expect to happen

At the very least expect the "error" event to be emitted with the error returned by the reconnect strategy. Even better would be for a "terminated" or similar signal to be emitted to tell the library that the retry strategy failed and allow it to do it's own recovery.

Node.js Version

18.20.6

Redis Server Version

7.3.241

Node Redis Version

5.0.1

Platform

Linux

Logs

The text was updated successfully, but these errors were encountered:

nkaradzhov · 2025-05-20T11:16:15Z

@edrose I dont see the issue here. Even though this.#connect().catch(() => {})) swallows the error, #connect will still emit an error event, which will in turn trigger #onSocketError, continuing in a loop until the reconnection strategy returns something different than a number

edrose · 2025-05-20T11:50:03Z

True, internally there is nothing wrong with the logic - it does exactly what it's told to by the reconnection strategy. However as a user of the library I need to be able to detect when the library has given up retrying, otherwise I just have a dead client.

Listening for 'error' events isn't sufficient because those don't tell me if the client has given up retrying or not. If it has given up then further logic is needed to re-initialise the client, maybe re-fetch some configuration or even do a full restart.

The context is that we have a webapp and Redis running in a Kubernetes cluster on AWS spot instances. Every so often we get spot interruptions that cause our Redis instances to move to different nodes. When this occurs they go down, then come back up with a new IP address. Since we have replication this can occur with no downtime, but it does cause error events to be emitted whenever it happens, even if that doesn't cause an interruption in service.

To detect and recover from a persistent error I need to be able to tell the difference between "an error occurred and I'm going to try to fix it" and "an error occurred and I can't do anything more".

nkaradzhov · 2025-05-20T17:39:22Z

@edrose AFAIK the only thing that controls when the library will choose to stop retrying is the reconnectStrategy, which is controlled by the user. So, you have to return false or Error from your custom reconnectStrategy. At that point, you could invoke any other custom logic you wish. Here is a minimal example:

import { GenericContainer } from 'testcontainers';
import { createClient } from 'redis';

const container = await new GenericContainer('redis')
  .withExposedPorts(6379)
  .start();

const makeCappedRetryStrategy = (maxRetries: number, cleanup: () => void) => {
  return (retries: number, cause: Error) => {
    console.log(`retrying... ${retries}`);
    if (retries >= maxRetries) {
      cleanup();
      return false;
    }
    return 1000;
  }
}

const client = createClient({
  url: `redis://localhost:${container.getMappedPort(6379)}`,
  socket: {
    reconnectStrategy: makeCappedRetryStrategy(3, () => {
      console.log('cleanup function called');
    })
  }
});

client.on('error', () => {});
await client.connect();
await container.exec('redis-cli shutdown');

In principal, Im not against firing a 'terminated' event, but fundamentally, the behavior is controlled by the reconnectStrategy.

nkaradzhov · 2025-05-22T07:38:33Z

@edrose what do you think?

edrose · 2025-05-22T10:58:30Z

I don't think that code works when .duplicate() is used. So if duplicate clients are created for pub/sub then the reconnect strategy will also duplicate, which duplicates whatever work is done when the reconnect strategy function fails. There isn't any way of knowing which client called the callback function, so extra care must be taken to ensure that extra logic isn't called multiple times.

I think the pattern of using the side-effects of a callback function is also just asking for bugs to occur, and I would never do extra work in the callback unless the documentation explicitly states that it's acceptable. As a user of the library I don't know what state the Redis client is in when it calls the callback, so I wouldn't assume that it's safe to modify the state of the client whilst in that callback context.

As a bare minimum I would be using process.nextTick() to do any extra logic in the callback if I was forced to, but I still think it would be much cleaner and safer if an event is emitted.

nkaradzhov · 2025-05-22T11:32:12Z

I see your point. I think its sensible to add the event. @bobymicroby do you see any issues for us emitting a 'terminated' event on client when reconnectStrategy returns false ( which is when we kill the socket and client is unusable from that point on ) ?

bobymicroby · 2025-05-22T11:46:28Z

The code should work when duplicate is used as every client gets its own "retry" function instance. If it doesn't, it's a bug we missed, but from what I see there shouldn't be a problem. The retryStrategy function that the user supplies has no side effects and is pure. It receives the strategy(retries, cause) - retries so far and the latest error - so you are free to make informed decisions and to decide if you want to keep retrying or not based on that.

bobymicroby · 2025-05-22T13:01:28Z

What we can do ( just thinking out loud, not sure if it makes sense ) is wrap the last error in RetryStrategyExhausted(originalError) before emitting it here: https://github.com/bobymicroby/node-redis/blob/065eb5e9145db9152923a44b63a356c55f76cf3b/packages/client/lib/client/socket.ts#L192. Unfortunately this can break clients which expect the raw error and will need more consideration.

edrose added the Bug label May 7, 2025

nkaradzhov self-assigned this May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconnect Strategy doesn't work if connection is lost #2948

Reconnect Strategy doesn't work if connection is lost #2948

edrose commented May 7, 2025

nkaradzhov commented May 20, 2025

Uh oh!

edrose commented May 20, 2025

Uh oh!

nkaradzhov commented May 20, 2025

Uh oh!

nkaradzhov commented May 22, 2025

Uh oh!

edrose commented May 22, 2025

Uh oh!

nkaradzhov commented May 22, 2025

Uh oh!

bobymicroby commented May 22, 2025

Uh oh!

bobymicroby commented May 22, 2025

Uh oh!

Reconnect Strategy doesn't work if connection is lost #2948

Reconnect Strategy doesn't work if connection is lost #2948

Comments

edrose commented May 7, 2025

Description

Behaviour on Initial Connection

Behaviour when connection is lost

What I expect to happen

Node.js Version

Redis Server Version

Node Redis Version

Platform

Logs

nkaradzhov commented May 20, 2025

Uh oh!

edrose commented May 20, 2025

Uh oh!

nkaradzhov commented May 20, 2025

Uh oh!

nkaradzhov commented May 22, 2025

Uh oh!

edrose commented May 22, 2025

Uh oh!

nkaradzhov commented May 22, 2025

Uh oh!

bobymicroby commented May 22, 2025

Uh oh!

bobymicroby commented May 22, 2025

Uh oh!