Skip to content

Skip the remaining callbacks on connectivity state change #2419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Hexta
Copy link

@Hexta Hexta commented Apr 9, 2023

Skip the remaining callbacks in subchannel.transitionToState() if connectivity state has been changed before callbacks called.

Connectivity state could be changed by calling client.close() inside a callback.
As a result - infinite busy loop in LoadBalancingCall.doPick().

It affected etcd3 client library.

Skip the remaining callbacks in subchannel.transitionToState()
if connectivity state has been changed before callback called.

Connectivity state could be changed by calling client.close.
As a result - infinite busyloop in LoadBalancingCall.doPick().
@murgatroid99
Copy link
Member

The change seems reasonable, but I don't understand the test. Why should client2.waitForReady end with an error? Why should client2 then be able to make a request successfully?

@Hexta
Copy link
Author

Hexta commented Apr 10, 2023

client2.waitForReady will end with Failed to connect before the deadline due to skipped the remaining connectivity state listeners. The subchannel will remain in IDLE state.
On the next request the subchannel will transit to CONNECTING state due to call stack:

  • Subchannel.transitionToState
  • Subchannel.startConnecting
  • ChannelSubchannelWrapper.startConnecting
  • PickFirstLoadBalancer.exitIdle
  • ChildLoadBalancerHandler.exitIdle
  • ResolvingLoadBalancer.exitIdle
  • InternalChannel.getConfig
  • ResolvingCall.getConfig
  • ResolvingCall.start
  • BaseUnaryInterceptingCall.start
  • BaseUnaryInterceptingCall.start
  • ServiceClientImpl.makeUnaryRequest

@murgatroid99
Copy link
Member

client1.close shouldn't affect the subchannel's connectivity state, because it is also used by client2. If it does, that is a bug.

And this change doesn't actually cause the subchannel's state to not change, it just causes it to not notify listeners of the change.

Maybe a better solution would be to call the listener with this.connectivityState instead of newState. That will occasionally cause some redundant notifications, but that shouldn't be a problem.

@Hexta
Copy link
Author

Hexta commented Apr 11, 2023

Unfortunately, client1.close affects the subchannel's connectivity state.
The subchannel will remain in IDLE state even without this change, right.

In the case of 2 clients, there are 4 registered connectivity state listeners in the beginning of transition CONNECTING -> READY:

  1. ChannelSubchannelWrapper.subchannelStateListener for 1st channel
  2. PickFirstLoadBalancer.pickedSubchannelStateListener for 1st channel
  3. ChannelSubchannelWrapper.subchannelStateListener for 2nd channel
  4. PickFirstLoadBalancer.pickedSubchannelStateListener for 2nd channel

2nd listener calls client1.close.

call stack (line numbers could be a little bit different due to some debug prints)
  • /grpc-js/test/test-global-subchannel-pool.ts:97:55
  • Object.checkState [as callback] (/grpc-js/src/client.ts:188:9)
  • InternalChannel.updateState (/grpc-js/src/internal-channel.ts:397:23)
  • Object.updateState (/grpc-js/src/internal-channel.ts:250:14)
  • ResolvingLoadBalancer.updateState (/grpc-js/src/resolving-load-balancer.ts:288:31)
  • Object.updateState (/grpc-js/src/resolving-load-balancer.ts:173:14)
  • ChildLoadBalancerHandler.ChildPolicyHelper.updateState (/grpc-js/src/load-balancer-child-handler.ts:61:40)
  • PickFirstLoadBalancer.updateState (/grpc-js/src/load-balancer-pick-first.ts:335:31)
  • PickFirstLoadBalancer.pickSubchannel (/grpc-js/src/load-balancer-pick-first.ts:320:10)
  • PickFirstLoadBalancer.subchannelStateListener (/grpc-js/src/load-balancer-pick-first.ts:169:14)
  • Subchannel.transitionToState (/grpc-js/src/subchannel.ts:272:7)
  • /grpc-js/src/subchannel.ts:179:18

client1.close changes the subchannel's connectivity state READY -> IDLE.

call stack
  • Subchannel.transitionToState (/grpc-js/src/subchannel.ts:228:22)
  • Subchannel.unref (/grpc-js/src/subchannel.ts:333:12)
  • Subchannel.unrefIfOneRef (/grpc-js/src/subchannel.ts:345:12)
  • /grpc-js/src/subchannel-pool.ts:73:38
  • Array.filter (<anonymous>)
  • SubchannelPool.unrefUnusedSubchannels (/grpc-js/src/subchannel-pool.ts:72:51)
  • InternalChannel.close (/grpc-js/src/internal-channel.ts:575:25)
  • ChannelImplementation.close (/grpc-js/src/channel.ts:120:26)
  • /grpc-js/test/test-global-subchannel-pool.ts:139:30
  • Object.checkState [as callback] (/grpc-js/src/client.ts:189:9)
  • InternalChannel.updateState (/grpc-js/src/internal-channel.ts:421:23)
  • Object.updateState (/grpc-js/src/internal-channel.ts:276:14)
  • ResolvingLoadBalancer.updateState (/grpc-js/src/resolving-load-balancer.ts:289:31)
  • Object.updateState (/grpc-js/src/resolving-load-balancer.ts:173:14)
  • ChildLoadBalancerHandler.ChildPolicyHelper.updateState (/grpc-js/src/load-balancer-child-handler.ts:60:40)
  • PickFirstLoadBalancer.updateState (/grpc-js/src/load-balancer-pick-first.ts:335:31)
  • PickFirstLoadBalancer.pickSubchannel (/grpc-js/src/load-balancer-pick-first.ts:320:10)
  • PickFirstLoadBalancer.subchannelStateListener (/grpc-js/src/load-balancer-pick-first.ts:169:14)
  • Subchannel.transitionToState (/grpc-js/src/subchannel.ts:300:7)
  • /grpc-js/src/subchannel.ts:184:18

Subchannel.unrefIfOneRef is called because refcount === 1 at this moment because
PickFirstLoadBalancer.resetSubchannelList called Subchannel.unref twice on client1.close.

1st attempt
  • Subchannel.unref (/grpc-js/src/subchannel.ts:292:22)
  • ChannelSubchannelWrapper.unref (/grpc-js/src/internal-channel.ts:105:16)
  • PickFirstLoadBalancer.resetSubchannelList (/grpc-js/src/load-balancer-pick-first.ts:341:18)
  • PickFirstLoadBalancer.destroy (/grpc-js/src/load-balancer-pick-first.ts:451:10)
  • ChildLoadBalancerHandler.destroy (/grpc-js/src/load-balancer-child-handler.ts:151:25)
  • ResolvingLoadBalancer.destroy (/grpc-js/src/resolving-load-balancer.ts:325:28)
  • InternalChannel.close (/grpc-js/src/internal-channel.ts:544:32)
  • ChannelImplementation.close (/grpc-js/src/channel.ts:120:26)
  • ServiceClientImpl.close (/grpc-js/src/client.ts:167:26)
  • /grpc-js/test/test-global-subchannel-pool.ts:98:20
  • Object.checkState [as callback] (/grpc-js/src/client.ts:188:9)
  • InternalChannel.updateState (/grpc-js/src/internal-channel.ts:397:23)
  • Object.updateState (/grpc-js/src/internal-channel.ts:250:14)
  • ResolvingLoadBalancer.updateState (/grpc-js/src/resolving-load-balancer.ts:288:31)
  • Object.updateState (/grpc-js/src/resolving-load-balancer.ts:173:14)
  • ChildLoadBalancerHandler.ChildPolicyHelper.updateState (/grpc-js/src/load-balancer-child-handler.ts:61:40)
  • PickFirstLoadBalancer.updateState (/grpc-js/src/load-balancer-pick-first.ts:335:31)
  • PickFirstLoadBalancer.pickSubchannel (/grpc-js/src/load-balancer-pick-first.ts:320:10)
  • PickFirstLoadBalancer.subchannelStateListener (/grpc-js/src/load-balancer-pick-first.ts:169:14)
  • Subchannel.transitionToState (/grpc-js/src/subchannel.ts:272:7)
  • /grpc-js/src/subchannel.ts:179:18
2nd attempt
  • Subchannel.unref (/grpc-js/src/subchannel.ts:292:22)
  • PickFirstLoadBalancer.destroy (/grpc-js/src/load-balancer-pick-first.ts:457:19)
  • ChildLoadBalancerHandler.destroy (/grpc-js/src/load-balancer-child-handler.ts:151:25)
  • ResolvingLoadBalancer.destroy (/grpc-js/src/resolving-load-balancer.ts:325:28)
  • InternalChannel.close (/grpc-js/src/internal-channel.ts:544:32)
  • ChannelImplementation.close (/grpc-js/src/channel.ts:120:26)
  • ServiceClientImpl.close (/grpc-js/src/client.ts:167:26)
  • /grpc-js/test/test-global-subchannel-pool.ts:98:20
  • Object.checkState [as callback] (/grpc-js/src/client.ts:188:9)
  • InternalChannel.updateState (/grpc-js/src/internal-channel.ts:397:23)
  • Object.updateState (/grpc-js/src/internal-channel.ts:250:14)
  • ResolvingLoadBalancer.updateState (/grpc-js/src/resolving-load-balancer.ts:288:31)
  • Object.updateState (/grpc-js/src/resolving-load-balancer.ts:173:14)
  • ChildLoadBalancerHandler.ChildPolicyHelper.updateState (/grpc-js/src/load-balancer-child-handler.ts:61:40)
  • PickFirstLoadBalancer.updateState (/grpc-js/src/load-balancer-pick-first.ts:335:31)
  • PickFirstLoadBalancer.pickSubchannel (/grpc-js/src/load-balancer-pick-first.ts:320:10)
  • PickFirstLoadBalancer.subchannelStateListener (/grpc-js/src/load-balancer-pick-first.ts:169:14)
  • Subchannel.transitionToState (/grpc-js/src/subchannel.ts:272:7)
  • /grpc-js/src/subchannel.ts:179:18
  • processTicksAndRejections (node:internal/process/task_queues:95:5)

PickFirstLoadBalancer.destroy calls

  1. PickFirstLoadBalancer.resetSubchannelList()
  2. PickFirstLoadBalancer.currentPick.unref()

PickFirstLoadBalancer.resetSubchannelList() calls .unref() for every subchannel in PickFirstLoadBalancer.subchannels.

Array PickFirstLoadBalancer.subchannels contains the subchannel when 2nd listener called.
The subchannel is still in the array because 2nd listener (PickFirstLoadBalancer.subchannelStateListener) calls this.updateState() before calling this.resetSubchannelList().

TL;DR
I don’t t mind calling the listener with this.connectivityState instead of newState.
I would like to note that it'd not change the behaviour of 1st attempt of client2.waitForReady().
There will be still the same error Deadline passed without connectivity state change because client2 will remain in CONNECTING state.

@murgatroid99
Copy link
Member

The subchannel is still in the array because 2nd listener (PickFirstLoadBalancer.subchannelStateListener) calls this.updateState() before calling this.resetSubchannelList().

This is the bug. And actually, when I fix that, there is another bug elsewhere.

I now believe that the change you have here is not what is needed. I'll put together a separate PR.

@murgatroid99
Copy link
Member

I published version 1.8.14 with a change that seems to fix this problem. Please try it out.

@Hexta
Copy link
Author

Hexta commented Apr 13, 2023

It works as expected.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants