[CBR-243] improve wallet worker start-up and exception handling #3330

k0001 · 2018-07-31T13:32:28Z

Description

This code improve wallet worker start-up and exception handling. The previous implementation of this wasn't particularly careful regarding exceptions and logging. For example, linking the worker thread with the calling thread could be prevented by an async exception, leaving a dangling worker running. This code carefully considers these scenarios. Additionally, this new code prevents invoke from being used outside its intended scope. If that happens, then the caller will get an exception.

The Shutdown constructor for WalletAction was removed because all it achieved was opening a second exit door for the worker client, which further complicated the implementation of withWalletWorker's internals. The only exit now is having the client simply return or fail with an exception, as it can be seen in bracketPassiveWallet's usage.

This is implemented using closeable queues in STM, which has the nice side effect of making our invoke function run in STM, which is likely to prove useful in implementing the rest of the functions in PassiveWalletLayer. Ideally we would be using TMQueue (or TChan) from the stm-chans package, but here we roll out our own closeable queue implementation.

Additionally, some unused wallet creation code was removed from bracketPassiveWallet.

Linked issue

CBR-243

Type of change

🐞 Bug fix (non-breaking change which fixes an issue)
🛠 New feature (non-breaking change which adds functionality)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)
🏭 Refactoring that does not change existing functionality but does improve things like code readability, structure etc
🔨 New or improved tests for existing code
⛑ git-flow chore (backport, hotfix, etc)

Developer checklist

I have read the style guide document, and my code follows the code style of this project.
If my code deals with exceptions, it follows the guidelines.
I have updated any documentation accordingly, if needed. Documentation changes can be reflected in opening a PR on cardanodocs.com, amending the inline Haddock comments, any relevant README file or one of the document listed in the docs directory.

Testing checklist

I have added tests to cover my changes.
All new and existing tests passed.

QA Steps

Screenshots (if available)

matt-noonan · 2018-07-31T17:59:11Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+  => WalletActionInterp IO a
+  -> ((WalletAction a -> STM ()) -> m b)
+  -> m b
+withWalletWorker wai k = do


I don't quite follow the need for STM here over a Chan and maybe some MVars. Can you walk me through it?

Well, the word “some” in “some MVars” should give a hint :) Working with more than one MVar at the time and interacting with them transactionally is very hard. Same for Chan. With STM it is relatively straightforward to achieve the high level overview described in this withWalletWorker's public comment.

The fact that then STM shows up in the signature of withWalletWorker is a nice accident. So far, the fields inside PassiveWalletLayer are not using this, but I suspect we will likely appreciate having that as we go an implement the rest of those fields.

With this perspective in mind, I think the comments inside withWalletWorker describe in detail what's going on. I'm not sure what to add to that. Is there a particular thing that concerns you?

So why are we not using TChans?

Before I chime in on whether or not some concurrency primitive is better than another one, I would like to understand which was the limitation of the previous implementation that Matt cooked up. Why is a single Chan not sufficient anymore? Which are the extra bits of state that we are introducing which makes the use of STM necessary?

I think clarifying that would help us understanding the thought process behind your change here 😉

@erikd what would be the argument for using TChans?

parsonsmatt · 2018-08-01T00:53:06Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+            -- Prevent new input.
+            STM.atomically (STM.writeTVar tvOpen False)
+            -- Wait for the worker to finish.
+            either Ex.throwM pure =<< takeMVar mDone)


I would love to see the practical reason around the masking here 😄 I am unsure how this differs from:

do liftIO $ void $ forkFinally ... finally (k pushWA) (liftIO $ do ...

My understanding is that we mask async exceptions, and then immediately restore them for the call to liftIO . forkFinally. Then we call Ex.finally in masked state, and restore the action we pass. I suppose that the finalizer is also running with async exceptions masked. But, this is already the case.

I am not sure how it differs either 😄 I had things differently here, and changed them at last minute before submitting the PR. I think now it can be simplified as you suggest. Thanks.

Exception masking is just one of those things that I know is complicated and that I don't understand well so I check it very carefully when I see it in PR 😅

parsonsmatt · 2018-08-01T00:54:51Z

wallet-new/src/Cardano/Wallet/Kernel/Wallets.hs

-             -> IO (Either CreateWalletError HdRoot)
-createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = do
+             -> m (Either CreateWalletError HdRoot)
+createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = liftIO $ do


Why preemptively lift here?

It cleans up things at the call site.

We are trying for the Kernel to keep the monad stack as simple as possible and IO is the lowest common denominator anyway, so I would be tempted to agree with Matt here and simply have this one live in IO.

For the wallet layer instead, I do agree with you that it makes things a bit cleaner. How about you do not lift in any Kernel function but you do lift in the WalletLayer handlers? That strikes me as a decent compromise.

adinapoli-iohk

Thanks for your PR, @k0001 ! I think I do need a bit more clarification behind the changes introduced by this PR in order to understand better that this is what we need. 😺

adinapoli-iohk · 2018-08-01T08:25:01Z

wallet-new/cardano-sl-wallet-new.cabal

@@ -173,6 +173,7 @@ library
                     , lens
                     , log-warper
                     , memory
+                     , mmorph


@erikd Recently issued a proposal on new package adoption that can be summarised with this: "Adding dependency to a very big project is not a free lunch, each new dependency which is not already a direct or indirect dependency should be justified". Are we sure using mmorph is absolutely necessary here? It is a very light dependency if I remember correctly, but if all we are using it for is a hoisting of some kind between monads, cannot we simply make the code slightly less general and get rid of it?

Not saying we should, but if we pull this dependency in, let's make sure it's for a good reason 😉

I answered this in the another comment, but to sum it up: mmorph brings in some rather fundamental vocabulary about functors like WalletActionInterp, requiring some predictable behavior from them. There's no need to reinvent that vocabulary.

adinapoli-iohk · 2018-08-01T08:27:21Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+  hoist nat i = WalletActionInterp
+    { applyBlocks = fmap nat (applyBlocks i)
+    , switchToFork = fmap (fmap nat) (switchToFork i)
+    , emit = fmap nat (emit i) }


What was wrong with the previous lifting to justify the replacement with this instance from mmorph? If there is a reason down the code, it at least requires a comment here 😉

When one sees hoist, fmap, >>= and the like, one can readily understand how something composes. Imagine if instead we had changeTheMonadInFoo, modifyWhatIsInsideFoo and thisFooAfterThatFoo: We'd have to research the behavior of these functions each time we wanted to use them.

Uhm, I suspect this is subject to debate depending to who you ask; yours is surely a respectable point of view, but at the same time lifted here (or hoist lift) I don't think they are meant to be used so frequently in practice to justify the complexity brought in by this one.

To really play the devil's advocate, lift and any variant like lifted might be more familiar to the average Haskeller than hoist, which is something introduced fairly "recently" (say 5-6 years 😅 , but not everybody might know about the existence of the mmorph package to begin with) in the Haskell ecosystem whereas lift & co have been around since Babylonian times.

But you have a point on fmap and >>=, I just feel that here what buys us is not enough as one is likely to use these functions sparingly and inside the wallet worker module, likely.

@adinapoli-iohk and I just had a conversation. Ultimately, MFunctor, just like Functor, bring in simplicity at the cost of paying the upfront cost of understanding the laws of fmap and hoist once. Unless a majority can give a compelling reason why not to replace the ad-hoc implementation of hoist with a proper MFunctor instance, this instance will stay in.

I'm in favor of using mmorph explicitly. It's a good abstraction with a clear set of laws, and I don't anticipate it will need to be used too much.

Oh, I've got a bit of paint stored up for this particular bikeshed...

I'm generally in favor of mmorph and hoist and don't see any problems introducing those in general. But in this specific case, I don't think it is right.

First of all, I think this is not a lawful use of MFunctor, because WalletActionInterp m isn't a Monad. Or if it is, it (1) isn't obvious to me and (2) doesn't matter for this code. More on that below.

In reality, the whole Monad constraint is wrong here (too strict). @edsko may recall that an earlier version of the worker just represented actions as elements of a Monoid, because the worker just sequences and dispatches actions. It never needs to snoop around with their results. So the real worker uses some monoid like IO (). And really lifted should just be applying a monoid homomorphism.

Long story short, I wish I'd stuck to the weaker Monoid abstraction here, because all the business of monads and transformers distracts from the simple job of the worker: to sequence a bunch of opaque actions.

First of all, I think this is not a lawful use of MFunctor, because WalletActionInterp m isn't a Monad.

I don't think it needs to be one, it only needs to be a functor in the category of monads. And WalletActionInterp is one of those. The handwavy basic requirement is that the kind of the functor is (* -> *) -> * -> *, and that n only ever shows up in positive positions inside f. Here's an example in the wild of something that is not a functor in the category of monads, even though its kind matches the expectation of MFunctor: https://hackage.haskell.org/package/di-monad-1.2/docs/Di-Monad.html#v:hoistDiT

adinapoli-iohk · 2018-08-01T08:35:35Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+walletWorker wai getWA = Ex.bracket_
+  (emit wai "Starting wallet worker.")
+  (evalStateT
+     (fix $ \next -> lift getWA >>= \case


Why the use of fix here? If there is some subtle memory efficiency reason, I think it would be good to spell it out.

Hmm, no, there's no memory efficiency reason involved. Just normal recursive monadic code.

adinapoli-iohk · 2018-08-01T08:36:02Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+  => WalletActionInterp IO a
+  -> ((WalletAction a -> STM ()) -> m b)
+  -> m b
+withWalletWorker wai k = do


Before I chime in on whether or not some concurrency primitive is better than another one, I would like to understand which was the limitation of the previous implementation that Matt cooked up. Why is a single Chan not sufficient anymore? Which are the extra bits of state that we are introducing which makes the use of STM necessary?

I think clarifying that would help us understanding the thought process behind your change here 😉

adinapoli-iohk · 2018-08-01T08:57:14Z

wallet-new/src/Cardano/Wallet/Kernel/Wallets.hs

-             -> IO (Either CreateWalletError HdRoot)
-createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = do
+             -> m (Either CreateWalletError HdRoot)
+createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = liftIO $ do


We are trying for the Kernel to keep the monad stack as simple as possible and IO is the lowest common denominator anyway, so I would be tempted to agree with Matt here and simply have this one live in IO.

For the wallet layer instead, I do agree with you that it makes things a bit cleaner. How about you do not lift in any Kernel function but you do lift in the WalletLayer handlers? That strikes me as a decent compromise.

adinapoli-iohk · 2018-08-01T09:06:04Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+  let pushWA :: WalletAction a -> STM ()
+      pushWA = \wa -> do STM.check =<< STM.readTVar tvOpen
+                         STM.writeTQueue tqWA wa
+  liftIO $ void $ forkFinally


Why not just async and its primitives? Our code base has been purged of any occurrences of fork* a while ago and our code policy discourages the use of such "low-level" primitive, exactly because you have much less control on the thread life cycle and you end up with this big resource-cleanups blocks.

adinapoli-iohk · 2018-08-01T09:08:37Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+  -> m b
+withWalletWorker wai k = do
+  -- 'mDone' is full if the worker finished.
+  mDone :: MVar (Either Ex.SomeException ()) <- liftIO newEmptyMVar


This is basically exactly what waitCatch gives you from the async package, at least seems like so, superficially. Why do you feel this MVar-based solution is better?

Maybe you are right. I can't remember the details at this moment because I first implemented this many weeks ago, but if I recall correctly it had to do with Async mandating IO everywhere, which was preventing me from doing... something. I can try and reimplement my changes with Async now and see if that still holds. Maybe it doesn't anymore and Async will clean this a bit.

It turns out things are better with async. I just submitted those changes. Thanks for the suggestion.

edsko

I feel that the discussion on this PR has drowned in discussions of technical details. Before we can address those, I would like to know: what is the goal here? What was wrong with the original code? Why did you feel that this was urgent to fix? What motivates the change from a single Chan to a multitude of concurrent variables, thereby requiring STM? Why did you need to roll your own thread management, rather than use async? At the moment this feels heavily over-engineered to me (that's not to say it's wrong, but without a proper problem statement, it's impossible evaluate). It's not even clear to me what the original problem was, or even under which ticket this falls. We need context and motivation; the original code was fairly straight-forward and simple, we'd need good reasons to change that. (I'm not talking about hoist versus something else; I'm talking broader picture here).

We don't have much time left and I want to make sure we spend it wisely.

k0001 · 2018-08-01T13:39:19Z

@edsko The old walletWorker wasn't careful with logging. The old forkWalletWorker could leave a dangling worker running and wasn't careful with ending the wallet worker (moreover, it expects the caller to send Shutdown, which puts the burden on the wrong place. I'd argue that Shutdown shouldn't exist at all), nor clear about the expectations when using invoke out of its intended scope, nor the behavior in case of exceptions. Would you care to take a look at the code again? I did improve it a bit since submitting it yesterday.

About how “urgent” this is: That's up for debate, but we did agree that submitting a PR for this code was a good idea, so here it is.

k0001 · 2018-08-01T14:08:51Z

Another though: Chans (and TQueues, TChans and similar beasts) are bidirectional, so making them arguments to top-level functions is not particularly a good idea since one can't rely on the types to reason about how a function will use them.

edsko

@k0001 I would like to request

Please identify specific, concrete, scenarios that may go wrong with the existing worker, so that we know what we're talking about.
What is the minimal solution to those problems? In particular, might there be a solution that doesn't require STM (in other words, sticks with a single Chan, since I totally agree with you that dealing with multiple MVars indeed is a big red flag), and ideally also without all of this exception masking stuff.
As regards to the whole hoist issue -- I don't have strong feelings about this, although if Matt is right (I haven't looked at the details) it seems strange to talk about a functor on the category of monads when the things that we're constructing aren't actually monads -- but there is a much more pragmatic but nonetheless much more important concern, and that is that we badly need to bring the number of dependencies that we have down.

k0001 · 2018-08-01T15:45:56Z

Please identify specific, concrete, scenarios that may go wrong with the existing worker, so that we know what we're talking about.

See here and here.

What is the minimal solution to those problems? In particular, might there be a solution that doesn't require STM (in other words, sticks with a single Chan, since I totally agree with you that dealing with multiple MVars indeed is a big red flag), and ideally also without all of this exception masking stuff.

Can you elaborate why you'd like to avoid STM? The current needs seem to boil down to having some kind of channel, and some way to mark this channel as closed. We have a TQueue and a TVar for this. Yes, I think at this point a MVar and a Chan could be made to work as optimizations, but I wouldn't dare jump right into that without us first agreeing that any of this effort is valuable.

... hoist ...

We are discussing that in the work chat :)

edsko · 2018-08-01T18:04:50Z

Had a long chat with @k0001 about this. I am basically OK with this the idea behind this PR now; we agreed that @k0001 will refactor the code a bit to clarify its intent and document here the two problems that he's trying to solve, and then we can do a final pass over it on Friday and merge it.

adinapoli-iohk · 2018-08-03T07:35:07Z

Much better, thanks @k0001 !

As you need to rebase anyway, can I ask you to amend some small styling things I will point with inline comments? thanks 😉

Ps. You might want to remove the wip label now.

adinapoli-iohk · 2018-08-03T07:36:20Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

-    , forkWalletWorker
-    , walletWorker
+    , withWalletWorker
+    , Err_WalletWorkerExpired(..)


@k0001 We don't use this style for type names -- we don't use snake casing but standard camel case -- let's rename this WalletWorkerExpiredError.

adinapoli-iohk · 2018-08-03T07:36:38Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

-        tick = lift (readChan chan) >>= \case
-            Shutdown -> return ()
-            msg      -> interp ops msg >> tick
+-- | Connect a wallet action interpreter to a source actions. This function


@k0001 Maybe "to a source of actions" sounds better?

adinapoli-iohk · 2018-08-03T07:36:56Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+        Nothing -> pure ()
+        Just wa -> interp wai wa >> next)
+     initialWorkerState)
+  (emit wai "Stoping wallet worker.")


@k0001 Typo, should be "stopping".

adinapoli-iohk · 2018-08-03T07:37:34Z

wallet-new/src/Cardano/Wallet/Kernel/Actions.hs

+-- source will always be prioritized.
+--
+-- Usage of the obtained 'STM' action after the given continuation has returned
+-- is not possible. It will throw 'Err_WalletWorkerExpired'.


@k0001 Let's change the type to WalletWorkerExpiredError here as well.

adinapoli-iohk · 2018-08-03T07:38:34Z

wallet-new/src/Cardano/Wallet/Kernel/Wallets.hs

@@ -59,7 +59,8 @@ instance Exception CreateWalletError
 -------------------------------------------------------------------------------}

 -- | Creates a new HD 'Wallet'.
-createHdWallet :: PassiveWallet
+createHdWallet
+             :: PassiveWallet


@k0001 As a rule of thumb, let's not change other functions only for style purposes unsolicitedly, please 😉

edsko

LGTM. Dunno how severe the story with dependencies is, but we can revisit this at another time. Please make sure to rebase, handle conflcits, please hlint, and generally make CI go green, and then we can merge.

adinapoli-iohk · 2018-08-03T15:34:01Z

🎉 Thanks @k0001 !

k0001 requested review from edsko, matt-noonan and adinapoli-iohk July 31, 2018 13:32

k0001 requested review from denisshevchenko and parsonsmatt as code owners July 31, 2018 13:32

matt-noonan reviewed Jul 31, 2018

View reviewed changes

parsonsmatt reviewed Aug 1, 2018

View reviewed changes

adinapoli-iohk suggested changes Aug 1, 2018

View reviewed changes

edsko reviewed Aug 1, 2018

View reviewed changes

k0001 force-pushed the rc-cbr-243-wwo branch from d7c3a0f to 112e345 Compare August 1, 2018 13:37

edsko reviewed Aug 1, 2018

View reviewed changes

edsko added the wip label Aug 1, 2018

k0001 force-pushed the rc-cbr-243-wwo branch 2 times, most recently from 4a55f2d to 41b4308 Compare August 2, 2018 14:52

adinapoli-iohk approved these changes Aug 3, 2018

View reviewed changes

adinapoli-iohk reviewed Aug 3, 2018

View reviewed changes

k0001 force-pushed the rc-cbr-243-wwo branch from 41b4308 to ab84006 Compare August 3, 2018 07:54

edsko removed the wip label Aug 3, 2018

edsko approved these changes Aug 3, 2018

View reviewed changes

k0001 force-pushed the rc-cbr-243-wwo branch 2 times, most recently from 002bc1b to a8c269c Compare August 3, 2018 12:13

[CBR-243] improve wallet worker start-up and exception handling

b149815

k0001 force-pushed the rc-cbr-243-wwo branch from a8c269c to b149815 Compare August 3, 2018 12:50

adinapoli-iohk merged commit 8b3a33d into develop Aug 3, 2018

adinapoli-iohk deleted the rc-cbr-243-wwo branch August 3, 2018 15:33

[CBR-243] improve wallet worker start-up and exception handling #3330

[CBR-243] improve wallet worker start-up and exception handling #3330

Conversation

k0001 commented Jul 31, 2018 • edited Loading

Description

Linked issue

Type of change

Developer checklist

Testing checklist

QA Steps

Screenshots (if available)

Choose a reason for hiding this comment

k0001 Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adinapoli-iohk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k0001 Aug 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adinapoli-iohk Aug 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k0001 Aug 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edsko left a comment

Choose a reason for hiding this comment

k0001 commented Aug 1, 2018

k0001 commented Aug 1, 2018 • edited Loading

edsko left a comment

Choose a reason for hiding this comment

k0001 commented Aug 1, 2018

edsko commented Aug 1, 2018

adinapoli-iohk commented Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edsko left a comment

Choose a reason for hiding this comment

adinapoli-iohk commented Aug 3, 2018

k0001 commented Jul 31, 2018 •

edited

Loading

k0001 Jul 31, 2018 •

edited

Loading

k0001 Aug 1, 2018 •

edited

Loading

adinapoli-iohk Aug 1, 2018 •

edited

Loading

k0001 Aug 1, 2018 •

edited

Loading

k0001 commented Aug 1, 2018 •

edited

Loading

adinapoli-iohk commented Aug 3, 2018 •

edited

Loading