Race condition in TarantoolClientImpl #145

nicktorwald · 2019-03-29T12:12:05Z

Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.
Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.
Avoid LockSupport class usage for a thread to be suspended and woken
up. Actually, LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136

coveralls · 2019-03-29T12:36:07Z

Coverage increased (+0.1%) to 69.484% when pulling cc36724 on nicktorwald/gh-142-race-condition into acb17e4 on master.

Totktonada

Added a 'FIXUP' commit on top of your changes: added comments and XXX notes.

See also comments below.

Can we trigger situations that are described in the commit message and write test cases for them? It would guarantee that, say, refactoring of StateHelper will really fix the races (it is hard to verify with eyes). AFAIU the all described cases can be triggered if we'll pause one of threads (or several ones?) in certain point of execution? We use errinj (error injection) approach in tarantool itself to test such cases. An errinj is a global variable that can be set from a test and lead to an infinite loop in one certain execution point, the loop will exits when the variable will be changed back. We also have errinjs that are pause a thread for some amount of time, but timeouts approach often lead to test failing under a load (say, parallel test run), so the preferred way is to pause / unpause from a test code explicitly. Errinjs code is included only in debug build and testing harness is able to determine whether tests are run against debug or release build and skip certain tests for release build.

src/main/java/org/tarantool/TarantoolClientImpl.java

Totktonada · 2019-03-29T21:19:37Z

Also I would highlight in the commit message the certain problem that you observed with LockSupport. Now it is not obvious that you decided to replace it because of a certain problem, not just in case.

Totktonada · 2019-03-30T03:10:21Z

I thought how to better represent a connection state and want to share my current vision in sketchy form. Sorry, it is not much circumspect, more like a stream of consciousness. Consider this as 'a bag of ideas' and don't cycle if some part is unclear (it can be just wrong).

There are five actors: reader, writer, connecter, discovery and service task executor. The last one is to perform several actions: say, disconnect and then connect that is needed for discovery. (Also maybe 'connect' action is more like one of service task executor actions and connecter should not be considered as the separate actor?)

There are generations of a state. Each actor should only access state (say, socket) of its generation: maybe by copying it internally. It is needed because of two reasons. First, ensure that an actor have no ability to somehow change a state of other generation. Say, even if a writer of a previous generation stalls for a long time it cannot wake up and write to a socket from the next generation (that was created after reconnection). Second reason is to handle messages (FSM methods calls) from different actors and, say, don't reconnect twice when reader and writer both report an error.

FSM should accept all messages and queue them. Each message should contain a generation. Queueing allows us to consider actions as atomic and think about 'what to do it X succeeds/fails and then Y arrives'. We need to elaborate all possible combinations.

All state that is shared between actors should be accessed only using messages to FSM (maybe name it coordinator?). Maybe it should be hold in FSM / FSM context class?

Each state change should be atomic with corresponding actions. So we either have 'stopped' state or 'connected', not something in a middle. If a connection process fails we resides in stopped state (who should trigger the next attemp to connect in the case? A service task?).

As I see three states are enough: stopped, connected and closed (the terminal one). Don't sure we need a terminal state: we can have 'stopped, but trying to connect' and 'stopped and don't tried to connect'. Or maybe it does not have much sense if we'll create a new generation with its own state for each connect attempt? Don't sure.

We should decide who owns a state / context: say, who is reposible to close a socket. It seems that an actor cannot be responsible for that, so a coordinator should.

Also there is thought that splitting a state between generations looks as a half of work that is needed to handle several connections within one client. OTOH, don't sure we need it.

An actor should be allowed to subscribe to certain state changes. A reader and a writer should subscribe only to its generation, while discovery exists over generations, so maybe it should subscibe to messages dispite generations. So when a writer send 'write error' message to a coordinator, a coordinator will create a service task that will send a reader 'going to disconnect' message, then close a socket and change a state to 'stopped'. Kinda that. We need to elaborate to which outgoing messages actors should subscribe. Whether it should be only state changes? It seems that it is not so, because we want to notify a reader about an error in writer before close a socket, yep?

We also need to take care to propagation of an error message to a user, see also #30.

We need to plan that work and decide whether it worth to do it in the scope of fixing races or postpone as a refactoring task.

src/main/java/org/tarantool/TarantoolClientImpl.java

Totktonada

Looks okay.

Here I propose changes that make reconnection awaiting looks more 'native' with surrounding code, because of using CountDownLatch'es instead of ReentrantLock + Conditon. I don't sure whether the change is good from performance point of view. Maybe don't. So, please, consider this patch as RFC. I have stripped comments about synchronization tricks, because I think about rewriting them to be more clear (at least for me), but failed that attempt. I'll start a discussion in PR #145 about them.

Totktonada · 2019-04-12T12:25:33Z

Hi!

I have proposed some changes in Totktonada/gh-142-race-condition-proposal (80607e2). Here I just replace ReentrantLock + Conditon with CountDownLatch'es to make the code look more consistent. I don't sure it is good change, so, please, be critical.

I have a point to discuss. I tried to understand the cases you described in the commit messages and found it is hard. I propose to describe them in a bit more formal way: provide sketchy pseudocode with threads actions and use it in the commit message and to describe things in the code comments.

// state is the set of flags
st:
    READING
    WRITING
    RECONNECT
    CLOSED

     // reader
R1 | st |= READING
R2 | read responses
R3 | st &= ~READING
R4 | st = st == 0 ? RECONNECT : st;

     // writer
W1 | st |= WRITING
W2 | write requests
W3 | st &= ~WRITING
W4 | st = st == 0 ? RECONNECT : st;

     // connector
C1 | wait !st.READING && !st.WRITING && !st.CLOSED && st.RECONNECT and st = 0
C2 | start reader
C3 | start writer
C4 | goto C1

Now we can provide a formal case:

st = {RECONNECT}
C1 // st = {}
C2 // reader stalls
C3 // writer started immediately
W1 // st = {WRITING}
W2 // receive an error
W3 // st = {}
W4 // st = {RECONNECT}
// reader wokes up
R1 // st = {RECONNECT, READING}
R2 // receive an error
C4
C1 // stuck
R3 // st = {RECONNECT}
R4 // no-op
C2 // the thread wokes up
...

Here I failed to make threads stuck and everything looks okay. I assume here that park / unpark (or Condition, or CountDownLatch'es works as expected). Can you please provide cases you described in the commit message in that way? Now I don't sure we should move RECONNECT state drop.

Totktonada · 2019-04-12T12:27:55Z

Hm, it seems the proposal doesn't really work: https://travis-ci.org/tarantool/tarantool-java/jobs/519217538

nicktorwald · 2019-04-12T19:36:40Z

Let me extend your notation a bit. With IO-threads generations, say, G-suffix. G1 stands for the first generation.
Next, look at the client version at 06755d5 commit

Case 1 (assume the client is alive)

st = {ALIVE}
R1-G1 | fails with an error (i.e. I/O error) at L343
R2-G1 | st &= ~READING (st = WRITING) at L176
W1-G1 | fails with an error (i.e. was interrupted by R) at L189
W2-G1 | st &= ~WRITING (st = 0) at 191
W3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L192
W4-G1 | wakes the connector up
W5-G1 | finishes 
C1    | st == RECONNECT ? 0 : st (positive, st = 0) at L72
C2    | connects at L141
R3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L177
R4-G1 | finishes
C3    | starts new IO-treads at L163
R1-G2 | st |= READING (st = RECONNECT + READING) at L172
R2-G2 | waits for packets to be read at L343
W1-G2 | st |= WRINTING (st = RECONNECT + READING + WRINTING = RECONNECT + ALIVE) at L187
W2-G2 | waits for packets to be written at L369
C4    | falls asleep at L76
R3-G2 | fails with an error (i.e. I/O error) at L343
R4-G2 | st &= ~READING (st = RECONNECT + WRITING) at L176
R5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st) at L177
R6-G2 | finishes
W3-G2 | fails with an error (i.e. was interrupted by R) at L189
W4-G2 | st &= ~WRITING (st = RECONNECT) at 191
W5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st = RECONNECT) at L192
W6-G2 | finishes
C5    | becomes a zombie (there are no more IO threads to wake it up) at L76

Case 2 (assume the client is alive)

st = {ALIVE}
R1-G1 | fails with an error (i.e. I/O error) at L343
R2-G1 | st &= ~READING (st = WRITING) at L176
R3-G1 | st = st == 0 ? RECONNECT : st (negative, st = st = WRITING) at L177
R4-G1 | finishes
W1-G1 | fails with an error (i.e. was interrupted by R) at L189
W2-G1 | st &= ~WRITING (st = 0) at 191
W3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L192
W4-G1 | wakes the connector up at L178
W5-G1 | finishes 
C1    | st == RECONNECT ? 0 : st (positive, st = 0) at L72
C2    | connects at L141
C3    | starts new IO-treads at L163
R1-G2 | st |= READING (st = READING) at L172
R2-G2 | waits for packets to be read at L343
C4    | falls asleep at L76
R3-G2 | fails with an error (i.e. I/O error) at L343
R4-G2 | st &= ~READING (st = 0) at L176
R5-G2 | st = st == 0 ? RECONNECT : st (positive, st = RECOONECT) at L177
R6-G2 | wakes the connector up at L193
R7-G2 | finishes
W1-G2 | st |= WRINTING (st = RECONNECT + WRITING) at L187
W2-G2 | waits for packets to be written at L369
C5    | st == RECONNECT ? 0 : st (negative, st = RECONNECT + WRITING) at L72
C6    | falls asleep at L76
W3-G2 | fails with an error (i.e. was interrupted by R) at L189
W4-G2 | st &= ~WRITING (st = RECONNECT) at L191
W5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st = RECONNECT) at L192
W6-G2 | finishes
C7    | becomes a zombie (there are no more IO threads to wake it up) at L76

Totktonada · 2019-04-14T18:11:19Z

Thanks for the cases! I want to propose another wording for them. I cited your original commit message and propose some wording that I think looks more clear below.

Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state.

There are several problems here:

Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again.
Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck.
Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wokes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems above:

Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked.
Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once.
Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set.

These changes do not guarantee that a stalled reader/writer thread from a previous connection iteration will not try to read/write something from/into a new socket and will not try to change a state, but highly descrease probability of this situation.

@nicktorwald What do you think? Does it make things more clear?

Sorry that I spent so much of your time on that.

Totktonada

A bit more about comments.

src/main/java/org/tarantool/TarantoolClientImpl.java

Totktonada · 2019-04-15T13:06:54Z

These changes do not guarantee that a stalled reader/writer thread from a previous connection iteration will not try to read/write something from/into a new socket and will not try to change a state, but highly descrease probability of this situation.

Now I understood it is not so: we guarantee that a new reconnection iteration will not start until both old reader and writer will exit. So the cited sentence is not correct.

I would also expand the 'changes that eliminate the problems above' list with the following:

Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it).

Totktonada · 2019-04-16T22:10:51Z

@nicktorwald Please, file an issue re connection state handling refactoring when this PR will be merged.

A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED. A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met. The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state. There are several problems here: - Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again. - Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck. - Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wakes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED). This commit introduces several changes that eliminate the problems above: - Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked. - Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once. - Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading Tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set. - Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it). Fixes: #142 Affects: #34, #136

nicktorwald · 2019-04-18T08:56:31Z

Pushed in 2415808

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from 1b54ae0 to 8d8aed9 Compare March 29, 2019 12:35

nicktorwald requested a review from Totktonada March 29, 2019 12:44

nicktorwald added the bug Something isn't working label Mar 29, 2019

Totktonada reviewed Mar 29, 2019

View reviewed changes

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 3 times, most recently from c6b0320 to 217646d Compare April 3, 2019 06:01

Totktonada reviewed Apr 4, 2019

View reviewed changes

src/main/java/org/tarantool/TarantoolClientImpl.java Outdated Show resolved Hide resolved

src/main/java/org/tarantool/TarantoolClientImpl.java Outdated Show resolved Hide resolved

Totktonada approved these changes Apr 4, 2019

View reviewed changes

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 2 times, most recently from 953da96 to 1bb1f73 Compare April 4, 2019 12:45

Totktonada reviewed Apr 14, 2019

View reviewed changes

src/main/java/org/tarantool/TarantoolClientImpl.java Outdated Show resolved Hide resolved

src/main/java/org/tarantool/TarantoolClientImpl.java Outdated Show resolved Hide resolved

src/main/java/org/tarantool/TarantoolClientImpl.java Outdated Show resolved Hide resolved

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 2 times, most recently from 7a3b334 to 08d39c6 Compare April 15, 2019 07:33

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from 08d39c6 to bdbaac8 Compare April 15, 2019 19:12

nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from bdbaac8 to cc36724 Compare April 18, 2019 06:59

nicktorwald closed this Apr 18, 2019

nicktorwald deleted the nicktorwald/gh-142-race-condition branch April 18, 2019 08:52

nicktorwald mentioned this pull request Apr 18, 2019

TarantoolClientImpl: refactor handling a connection state of the client #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in TarantoolClientImpl #145

Race condition in TarantoolClientImpl #145

nicktorwald commented Mar 29, 2019 •

edited

Loading

coveralls commented Mar 29, 2019 •

edited

Loading

Totktonada left a comment

Totktonada commented Mar 29, 2019 •

edited

Loading

Totktonada commented Mar 30, 2019

Totktonada left a comment

Totktonada commented Apr 12, 2019

Totktonada commented Apr 12, 2019

nicktorwald commented Apr 12, 2019 •

edited

Loading

Totktonada commented Apr 14, 2019

Totktonada left a comment

Totktonada commented Apr 15, 2019

Totktonada commented Apr 16, 2019

nicktorwald commented Apr 18, 2019

Race condition in TarantoolClientImpl #145

Race condition in TarantoolClientImpl #145

Conversation

nicktorwald commented Mar 29, 2019 • edited Loading

coveralls commented Mar 29, 2019 • edited Loading

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Mar 29, 2019 • edited Loading

Totktonada commented Mar 30, 2019

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Apr 12, 2019

Totktonada commented Apr 12, 2019

nicktorwald commented Apr 12, 2019 • edited Loading

Totktonada commented Apr 14, 2019

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Apr 15, 2019

Totktonada commented Apr 16, 2019

nicktorwald commented Apr 18, 2019

nicktorwald commented Mar 29, 2019 •

edited

Loading

coveralls commented Mar 29, 2019 •

edited

Loading

Totktonada commented Mar 29, 2019 •

edited

Loading

nicktorwald commented Apr 12, 2019 •

edited

Loading