Skip to content

Race condition in TarantoolClientImpl #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nicktorwald
Copy link

@nicktorwald nicktorwald commented Mar 29, 2019

  • Avoid a possible race between reading, writing and reconnecting
    threads when a reconnection process is started.
    It might have happened that the lagged thread (reading or writing) could
    reset the state to RECONNECT after the reconnecting thread has already
    started and set the state to 0. As a result, all next attempts to
    reconnect will never happen. Now the reconnect thread holds on the state
    as long as it is required.

  • Avoid another possible race between reading and writing threads when
    they are started during the reconnection process.
    It might have happened that one of the threads crashed when it was
    starting and another slightly lagged thread set up its flag. It could
    have led that the reconnecting thread saw RECONNECT + R/W state instead
    of pure RECONNECT. Again, this case broke down all next reconnection
    attempts. Now reading and writing threads take into account whether
    RECONNECT state is already set or not.

  • Avoid LockSupport class usage for a thread to be suspended and woken
    up. Actually, LockSupport is more like an internal component to build
    high-level blocking primitives. It is not recommended using this class
    directly. It was replaced by ReentrantLock.Condition primitive based
    on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136

@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from 1b54ae0 to 8d8aed9 Compare March 29, 2019 12:35
@coveralls
Copy link

coveralls commented Mar 29, 2019

Coverage Status

Coverage increased (+0.1%) to 69.484% when pulling cc36724 on nicktorwald/gh-142-race-condition into acb17e4 on master.

@nicktorwald nicktorwald requested a review from Totktonada March 29, 2019 12:44
@nicktorwald nicktorwald added the bug Something isn't working label Mar 29, 2019
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a 'FIXUP' commit on top of your changes: added comments and XXX notes.

See also comments below.

Can we trigger situations that are described in the commit message and write test cases for them? It would guarantee that, say, refactoring of StateHelper will really fix the races (it is hard to verify with eyes). AFAIU the all described cases can be triggered if we'll pause one of threads (or several ones?) in certain point of execution? We use errinj (error injection) approach in tarantool itself to test such cases. An errinj is a global variable that can be set from a test and lead to an infinite loop in one certain execution point, the loop will exits when the variable will be changed back. We also have errinjs that are pause a thread for some amount of time, but timeouts approach often lead to test failing under a load (say, parallel test run), so the preferred way is to pause / unpause from a test code explicitly. Errinjs code is included only in debug build and testing harness is able to determine whether tests are run against debug or release build and skip certain tests for release build.

@Totktonada
Copy link
Member

Totktonada commented Mar 29, 2019

Also I would highlight in the commit message the certain problem that you observed with LockSupport. Now it is not obvious that you decided to replace it because of a certain problem, not just in case.

@Totktonada
Copy link
Member

I thought how to better represent a connection state and want to share my current vision in sketchy form. Sorry, it is not much circumspect, more like a stream of consciousness. Consider this as 'a bag of ideas' and don't cycle if some part is unclear (it can be just wrong).

There are five actors: reader, writer, connecter, discovery and service task executor. The last one is to perform several actions: say, disconnect and then connect that is needed for discovery. (Also maybe 'connect' action is more like one of service task executor actions and connecter should not be considered as the separate actor?)

There are generations of a state. Each actor should only access state (say, socket) of its generation: maybe by copying it internally. It is needed because of two reasons. First, ensure that an actor have no ability to somehow change a state of other generation. Say, even if a writer of a previous generation stalls for a long time it cannot wake up and write to a socket from the next generation (that was created after reconnection). Second reason is to handle messages (FSM methods calls) from different actors and, say, don't reconnect twice when reader and writer both report an error.

FSM should accept all messages and queue them. Each message should contain a generation. Queueing allows us to consider actions as atomic and think about 'what to do it X succeeds/fails and then Y arrives'. We need to elaborate all possible combinations.

All state that is shared between actors should be accessed only using messages to FSM (maybe name it coordinator?). Maybe it should be hold in FSM / FSM context class?

Each state change should be atomic with corresponding actions. So we either have 'stopped' state or 'connected', not something in a middle. If a connection process fails we resides in stopped state (who should trigger the next attemp to connect in the case? A service task?).

As I see three states are enough: stopped, connected and closed (the terminal one). Don't sure we need a terminal state: we can have 'stopped, but trying to connect' and 'stopped and don't tried to connect'. Or maybe it does not have much sense if we'll create a new generation with its own state for each connect attempt? Don't sure.

We should decide who owns a state / context: say, who is reposible to close a socket. It seems that an actor cannot be responsible for that, so a coordinator should.

Also there is thought that splitting a state between generations looks as a half of work that is needed to handle several connections within one client. OTOH, don't sure we need it.

An actor should be allowed to subscribe to certain state changes. A reader and a writer should subscribe only to its generation, while discovery exists over generations, so maybe it should subscibe to messages dispite generations. So when a writer send 'write error' message to a coordinator, a coordinator will create a service task that will send a reader 'going to disconnect' message, then close a socket and change a state to 'stopped'. Kinda that. We need to elaborate to which outgoing messages actors should subscribe. Whether it should be only state changes? It seems that it is not so, because we want to notify a reader about an error in writer before close a socket, yep?

We also need to take care to propagation of an error message to a user, see also #30.

We need to plan that work and decide whether it worth to do it in the scope of fixing races or postpone as a refactoring task.

@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 3 times, most recently from c6b0320 to 217646d Compare April 3, 2019 06:01
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay.

@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 2 times, most recently from 953da96 to 1bb1f73 Compare April 4, 2019 12:45
Totktonada added a commit that referenced this pull request Apr 12, 2019
Here I propose changes that make reconnection awaiting looks more
'native' with surrounding code, because of using CountDownLatch'es
instead of ReentrantLock + Conditon. I don't sure whether the change is
good from performance point of view. Maybe don't. So, please, consider
this patch as RFC.

I have stripped comments about synchronization tricks, because I think
about rewriting them to be more clear (at least for me), but failed that
attempt. I'll start a discussion in PR #145 about them.
@Totktonada
Copy link
Member

Hi!

I have proposed some changes in Totktonada/gh-142-race-condition-proposal (80607e2). Here I just replace ReentrantLock + Conditon with CountDownLatch'es to make the code look more consistent. I don't sure it is good change, so, please, be critical.

I have a point to discuss. I tried to understand the cases you described in the commit messages and found it is hard. I propose to describe them in a bit more formal way: provide sketchy pseudocode with threads actions and use it in the commit message and to describe things in the code comments.

// state is the set of flags
st:
    READING
    WRITING
    RECONNECT
    CLOSED
     // reader
R1 | st |= READING
R2 | read responses
R3 | st &= ~READING
R4 | st = st == 0 ? RECONNECT : st;
     // writer
W1 | st |= WRITING
W2 | write requests
W3 | st &= ~WRITING
W4 | st = st == 0 ? RECONNECT : st;
     // connector
C1 | wait !st.READING && !st.WRITING && !st.CLOSED && st.RECONNECT and st = 0
C2 | start reader
C3 | start writer
C4 | goto C1

Now we can provide a formal case:

st = {RECONNECT}
C1 // st = {}
C2 // reader stalls
C3 // writer started immediately
W1 // st = {WRITING}
W2 // receive an error
W3 // st = {}
W4 // st = {RECONNECT}
// reader wokes up
R1 // st = {RECONNECT, READING}
R2 // receive an error
C4
C1 // stuck
R3 // st = {RECONNECT}
R4 // no-op
C2 // the thread wokes up
...

Here I failed to make threads stuck and everything looks okay. I assume here that park / unpark (or Condition, or CountDownLatch'es works as expected). Can you please provide cases you described in the commit message in that way? Now I don't sure we should move RECONNECT state drop.

@Totktonada
Copy link
Member

Hm, it seems the proposal doesn't really work: https://travis-ci.org/tarantool/tarantool-java/jobs/519217538

@nicktorwald
Copy link
Author

nicktorwald commented Apr 12, 2019

Let me extend your notation a bit. With IO-threads generations, say, G-suffix. G1 stands for the first generation.
Next, look at the client version at 06755d5 commit

Case 1 (assume the client is alive)

st = {ALIVE}
R1-G1 | fails with an error (i.e. I/O error) at L343
R2-G1 | st &= ~READING (st = WRITING) at L176
W1-G1 | fails with an error (i.e. was interrupted by R) at L189
W2-G1 | st &= ~WRITING (st = 0) at 191
W3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L192
W4-G1 | wakes the connector up
W5-G1 | finishes 
C1    | st == RECONNECT ? 0 : st (positive, st = 0) at L72
C2    | connects at L141
R3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L177
R4-G1 | finishes
C3    | starts new IO-treads at L163
R1-G2 | st |= READING (st = RECONNECT + READING) at L172
R2-G2 | waits for packets to be read at L343
W1-G2 | st |= WRINTING (st = RECONNECT + READING + WRINTING = RECONNECT + ALIVE) at L187
W2-G2 | waits for packets to be written at L369
C4    | falls asleep at L76
R3-G2 | fails with an error (i.e. I/O error) at L343
R4-G2 | st &= ~READING (st = RECONNECT + WRITING) at L176
R5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st) at L177
R6-G2 | finishes
W3-G2 | fails with an error (i.e. was interrupted by R) at L189
W4-G2 | st &= ~WRITING (st = RECONNECT) at 191
W5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st = RECONNECT) at L192
W6-G2 | finishes
C5    | becomes a zombie (there are no more IO threads to wake it up) at L76

Case 2 (assume the client is alive)

st = {ALIVE}
R1-G1 | fails with an error (i.e. I/O error) at L343
R2-G1 | st &= ~READING (st = WRITING) at L176
R3-G1 | st = st == 0 ? RECONNECT : st (negative, st = st = WRITING) at L177
R4-G1 | finishes
W1-G1 | fails with an error (i.e. was interrupted by R) at L189
W2-G1 | st &= ~WRITING (st = 0) at 191
W3-G1 | st = st == 0 ? RECONNECT : st (positive, st = RECONNECT) at L192
W4-G1 | wakes the connector up at L178
W5-G1 | finishes 
C1    | st == RECONNECT ? 0 : st (positive, st = 0) at L72
C2    | connects at L141
C3    | starts new IO-treads at L163
R1-G2 | st |= READING (st = READING) at L172
R2-G2 | waits for packets to be read at L343
C4    | falls asleep at L76
R3-G2 | fails with an error (i.e. I/O error) at L343
R4-G2 | st &= ~READING (st = 0) at L176
R5-G2 | st = st == 0 ? RECONNECT : st (positive, st = RECOONECT) at L177
R6-G2 | wakes the connector up at L193
R7-G2 | finishes
W1-G2 | st |= WRINTING (st = RECONNECT + WRITING) at L187
W2-G2 | waits for packets to be written at L369
C5    | st == RECONNECT ? 0 : st (negative, st = RECONNECT + WRITING) at L72
C6    | falls asleep at L76
W3-G2 | fails with an error (i.e. was interrupted by R) at L189
W4-G2 | st &= ~WRITING (st = RECONNECT) at L191
W5-G2 | st = st == 0 ? RECONNECT : st (negative, st = st = RECONNECT) at L192
W6-G2 | finishes
C7    | becomes a zombie (there are no more IO threads to wake it up) at L76

@Totktonada
Copy link
Member

Thanks for the cases! I want to propose another wording for them. I cited your original commit message and propose some wording that I think looks more clear below.


  • Avoid a possible race between reading, writing and reconnecting
    threads when a reconnection process is started.
    It might have happened that the lagged thread (reading or writing) could
    reset the state to RECONNECT after the reconnecting thread has already
    started and set the state to 0. As a result, all next attempts to
    reconnect will never happen. Now the reconnect thread holds on the state
    as long as it is required.

  • Avoid another possible race between reading and writing threads when
    they are started during the reconnection process.
    It might have happened that one of the threads crashed when it was
    starting and another slightly lagged thread set up its flag. It could
    have led that the reconnecting thread saw RECONNECT + R/W state instead
    of pure RECONNECT. Again, this case broke down all next reconnection
    attempts. Now reading and writing threads take into account whether
    RECONNECT state is already set or not.

  • Replace LockSupport with ReentrantLock.Condition for a thread to be
    suspended and woken up. Our cluster tests and standalone demo app show
    that LockSupport is not a safe memory barrier as it could be. The
    reconnect thread relies on a visibility guarantee between park-unpark
    invocations which, actually, sometimes doesn't work. Also, according to
    java-docs LockSupport is more like an internal component to build
    high-level blocking primitives. It is not recommended using this class
    directly. It was replaced by ReentrantLock.Condition primitive based
    on LockSupport but which has proper LockSupport usage inside.


A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state.

There are several problems here:

  • Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again.
  • Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck.
  • Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wokes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems above:

  • Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked.
  • Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once.
  • Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set.

These changes do not guarantee that a stalled reader/writer thread from a previous connection iteration will not try to read/write something from/into a new socket and will not try to change a state, but highly descrease probability of this situation.


@nicktorwald What do you think? Does it make things more clear?

Sorry that I spent so much of your time on that.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more about comments.

@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch 2 times, most recently from 7a3b334 to 08d39c6 Compare April 15, 2019 07:33
@Totktonada
Copy link
Member

These changes do not guarantee that a stalled reader/writer thread from a previous connection iteration will not try to read/write something from/into a new socket and will not try to change a state, but highly descrease probability of this situation.

Now I understood it is not so: we guarantee that a new reconnection iteration will not start until both old reader and writer will exit. So the cited sentence is not correct.

I would also expand the 'changes that eliminate the problems above' list with the following:

  • Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it).

@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from 08d39c6 to bdbaac8 Compare April 15, 2019 19:12
@Totktonada
Copy link
Member

@nicktorwald Please, file an issue re connection state handling refactoring when this PR will be merged.

A state of a client is a set of the following flags: {READING, WRITING,
RECONNECT, CLOSED}. Let's name a state when no flags are set
UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an
interruption, drops READING and tries to trigger reconnection (if a
state allows, see below). A writer do quite same things, but with the
WRITING flag. The key point here is that a reconnection is triggered
from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread
is that a client has UNINITIALIZED state.

There are several problems here:

- Say, a reader stalls a bit after dropping READING, then a writer drops
WRITING and trigger reconnection. Then reader wokes up and set RECONNECT
again.

- Calling unpark() N times for a connector thread when it is alive
doesn't lead to skipping next N park() calls, so the problem above is
not just about extra reconnection, but lead the connector thread to be
stuck.

- Say, a reader stalls just before setting READING. A writer is hit by
an IO error and triggers reconnection (set RECONNECT, unpark connector).
Then the reader wakes up and set READING+RECONNECT state that disallows
a connector thread to proceed further (it expects pure RECONNECT). Even
when the reader drops READING it will not wake up (unpark) the connector
thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems
above:

- Use ReentrantLock + Condition instead of park() / unpark() to never
miss signals to reconnect, does not matter whether a connector is
parked.

- Ensure a reader and a writer threads from one generation (that are
created on the same reconnection iteration) triggers reconnection once.

- Hold RECONNECT state most of time a connector works (while acquiring
a new socket, connecting and reading Tarantool greeting) and prevent to
set READING/WRITING while RECONNECT is set.

- Ensure a new reconnection iteration will start only after old reader
and old writer threads exit (because we cannot receive a reconnection
signal until we send it).

Fixes: #142
Affects: #34, #136
@nicktorwald nicktorwald force-pushed the nicktorwald/gh-142-race-condition branch from bdbaac8 to cc36724 Compare April 18, 2019 06:59
@nicktorwald nicktorwald deleted the nicktorwald/gh-142-race-condition branch April 18, 2019 08:52
@nicktorwald
Copy link
Author

Pushed in 2415808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in TarantoolClientImpl
3 participants