Skip to content
This repository was archived by the owner on Aug 18, 2020. It is now read-only.

[CBR-243] improve wallet worker start-up and exception handling #3330

Merged
merged 1 commit into from
Aug 3, 2018

Conversation

k0001
Copy link
Contributor

@k0001 k0001 commented Jul 31, 2018

Description

This code improve wallet worker start-up and exception handling. The previous implementation of this wasn't particularly careful regarding exceptions and logging. For example, linking the worker thread with the calling thread could be prevented by an async exception, leaving a dangling worker running. This code carefully considers these scenarios. Additionally, this new code prevents invoke from being used outside its intended scope. If that happens, then the caller will get an exception.

The Shutdown constructor for WalletAction was removed because all it achieved was opening a second exit door for the worker client, which further complicated the implementation of withWalletWorker's internals. The only exit now is having the client simply return or fail with an exception, as it can be seen in bracketPassiveWallet's usage.

This is implemented using closeable queues in STM, which has the nice side effect of making our invoke function run in STM, which is likely to prove useful in implementing the rest of the functions in PassiveWalletLayer. Ideally we would be using TMQueue (or TChan) from the stm-chans package, but here we roll out our own closeable queue implementation.

Additionally, some unused wallet creation code was removed from bracketPassiveWallet.

Linked issue

CBR-243

Type of change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🛠 New feature (non-breaking change which adds functionality)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🏭 Refactoring that does not change existing functionality but does improve things like code readability, structure etc
  • 🔨 New or improved tests for existing code
  • ⛑ git-flow chore (backport, hotfix, etc)

Developer checklist

  • I have read the style guide document, and my code follows the code style of this project.
  • If my code deals with exceptions, it follows the guidelines.
  • I have updated any documentation accordingly, if needed. Documentation changes can be reflected in opening a PR on cardanodocs.com, amending the inline Haddock comments, any relevant README file or one of the document listed in the docs directory.

Testing checklist

  • I have added tests to cover my changes.
  • All new and existing tests passed.

QA Steps

Screenshots (if available)

=> WalletActionInterp IO a
-> ((WalletAction a -> STM ()) -> m b)
-> m b
withWalletWorker wai k = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the need for STM here over a Chan and maybe some MVars. Can you walk me through it?

Copy link
Contributor Author

@k0001 k0001 Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the word “some” in “some MVars” should give a hint :) Working with more than one MVar at the time and interacting with them transactionally is very hard. Same for Chan. With STM it is relatively straightforward to achieve the high level overview described in this withWalletWorker's public comment.

The fact that then STM shows up in the signature of withWalletWorker is a nice accident. So far, the fields inside PassiveWalletLayer are not using this, but I suspect we will likely appreciate having that as we go an implement the rest of those fields.

With this perspective in mind, I think the comments inside withWalletWorker describe in detail what's going on. I'm not sure what to add to that. Is there a particular thing that concerns you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why are we not using TChans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I chime in on whether or not some concurrency primitive is better than another one, I would like to understand which was the limitation of the previous implementation that Matt cooked up. Why is a single Chan not sufficient anymore? Which are the extra bits of state that we are introducing which makes the use of STM necessary?

I think clarifying that would help us understanding the thought process behind your change here 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikd what would be the argument for using TChans?

-- Prevent new input.
STM.atomically (STM.writeTVar tvOpen False)
-- Wait for the worker to finish.
either Ex.throwM pure =<< takeMVar mDone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see the practical reason around the masking here 😄 I am unsure how this differs from:

do
  liftIO $ void $ forkFinally ...
  finally (k pushWA) (liftIO $ do ...

My understanding is that we mask async exceptions, and then immediately restore them for the call to liftIO . forkFinally. Then we call Ex.finally in masked state, and restore the action we pass. I suppose that the finalizer is also running with async exceptions masked. But, this is already the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how it differs either 😄 I had things differently here, and changed them at last minute before submitting the PR. I think now it can be simplified as you suggest. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception masking is just one of those things that I know is complicated and that I don't understand well so I check it very carefully when I see it in PR 😅

-> IO (Either CreateWalletError HdRoot)
createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = do
-> m (Either CreateWalletError HdRoot)
createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = liftIO $ do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why preemptively lift here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cleans up things at the call site.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying for the Kernel to keep the monad stack as simple as possible and IO is the lowest common denominator anyway, so I would be tempted to agree with Matt here and simply have this one live in IO.

For the wallet layer instead, I do agree with you that it makes things a bit cleaner. How about you do not lift in any Kernel function but you do lift in the WalletLayer handlers? That strikes me as a decent compromise.

Copy link
Contributor

@adinapoli-iohk adinapoli-iohk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, @k0001 ! I think I do need a bit more clarification behind the changes introduced by this PR in order to understand better that this is what we need. 😺

@@ -173,6 +173,7 @@ library
, lens
, log-warper
, memory
, mmorph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikd Recently issued a proposal on new package adoption that can be summarised with this: "Adding dependency to a very big project is not a free lunch, each new dependency which is not already a direct or indirect dependency should be justified". Are we sure using mmorph is absolutely necessary here? It is a very light dependency if I remember correctly, but if all we are using it for is a hoisting of some kind between monads, cannot we simply make the code slightly less general and get rid of it?

Not saying we should, but if we pull this dependency in, let's make sure it's for a good reason 😉

Copy link
Contributor Author

@k0001 k0001 Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered this in the another comment, but to sum it up: mmorph brings in some rather fundamental vocabulary about functors like WalletActionInterp, requiring some predictable behavior from them. There's no need to reinvent that vocabulary.

hoist nat i = WalletActionInterp
{ applyBlocks = fmap nat (applyBlocks i)
, switchToFork = fmap (fmap nat) (switchToFork i)
, emit = fmap nat (emit i) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong with the previous lifting to justify the replacement with this instance from mmorph? If there is a reason down the code, it at least requires a comment here 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When one sees hoist, fmap, >>= and the like, one can readily understand how something composes. Imagine if instead we had changeTheMonadInFoo, modifyWhatIsInsideFoo and thisFooAfterThatFoo: We'd have to research the behavior of these functions each time we wanted to use them.

Copy link
Contributor

@adinapoli-iohk adinapoli-iohk Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I suspect this is subject to debate depending to who you ask; yours is surely a respectable point of view, but at the same time lifted here (or hoist lift) I don't think they are meant to be used so frequently in practice to justify the complexity brought in by this one.

To really play the devil's advocate, lift and any variant like lifted might be more familiar to the average Haskeller than hoist, which is something introduced fairly "recently" (say 5-6 years 😅 , but not everybody might know about the existence of the mmorph package to begin with) in the Haskell ecosystem whereas lift & co have been around since Babylonian times.

But you have a point on fmap and >>=, I just feel that here what buys us is not enough as one is likely to use these functions sparingly and inside the wallet worker module, likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adinapoli-iohk and I just had a conversation. Ultimately, MFunctor, just like Functor, bring in simplicity at the cost of paying the upfront cost of understanding the laws of fmap and hoist once. Unless a majority can give a compelling reason why not to replace the ad-hoc implementation of hoist with a proper MFunctor instance, this instance will stay in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of using mmorph explicitly. It's a good abstraction with a clear set of laws, and I don't anticipate it will need to be used too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've got a bit of paint stored up for this particular bikeshed...

I'm generally in favor of mmorph and hoist and don't see any problems introducing those in general. But in this specific case, I don't think it is right.

First of all, I think this is not a lawful use of MFunctor, because WalletActionInterp m isn't a Monad. Or if it is, it (1) isn't obvious to me and (2) doesn't matter for this code. More on that below.

In reality, the whole Monad constraint is wrong here (too strict). @edsko may recall that an earlier version of the worker just represented actions as elements of a Monoid, because the worker just sequences and dispatches actions. It never needs to snoop around with their results. So the real worker uses some monoid like IO (). And really lifted should just be applying a monoid homomorphism.

Long story short, I wish I'd stuck to the weaker Monoid abstraction here, because all the business of monads and transformers distracts from the simple job of the worker: to sequence a bunch of opaque actions.

Copy link
Contributor Author

@k0001 k0001 Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I think this is not a lawful use of MFunctor, because WalletActionInterp m isn't a Monad.

I don't think it needs to be one, it only needs to be a functor in the category of monads. And WalletActionInterp is one of those. The handwavy basic requirement is that the kind of the functor is (* -> *) -> * -> *, and that n only ever shows up in positive positions inside f. Here's an example in the wild of something that is not a functor in the category of monads, even though its kind matches the expectation of MFunctor: https://hackage.haskell.org/package/di-monad-1.2/docs/Di-Monad.html#v:hoistDiT

walletWorker wai getWA = Ex.bracket_
(emit wai "Starting wallet worker.")
(evalStateT
(fix $ \next -> lift getWA >>= \case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the use of fix here? If there is some subtle memory efficiency reason, I think it would be good to spell it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, there's no memory efficiency reason involved. Just normal recursive monadic code.

=> WalletActionInterp IO a
-> ((WalletAction a -> STM ()) -> m b)
-> m b
withWalletWorker wai k = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I chime in on whether or not some concurrency primitive is better than another one, I would like to understand which was the limitation of the previous implementation that Matt cooked up. Why is a single Chan not sufficient anymore? Which are the extra bits of state that we are introducing which makes the use of STM necessary?

I think clarifying that would help us understanding the thought process behind your change here 😉

-> IO (Either CreateWalletError HdRoot)
createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = do
-> m (Either CreateWalletError HdRoot)
createHdWallet pw mnemonic spendingPassword assuranceLevel walletName = liftIO $ do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying for the Kernel to keep the monad stack as simple as possible and IO is the lowest common denominator anyway, so I would be tempted to agree with Matt here and simply have this one live in IO.

For the wallet layer instead, I do agree with you that it makes things a bit cleaner. How about you do not lift in any Kernel function but you do lift in the WalletLayer handlers? That strikes me as a decent compromise.

let pushWA :: WalletAction a -> STM ()
pushWA = \wa -> do STM.check =<< STM.readTVar tvOpen
STM.writeTQueue tqWA wa
liftIO $ void $ forkFinally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just async and its primitives? Our code base has been purged of any occurrences of fork* a while ago and our code policy discourages the use of such "low-level" primitive, exactly because you have much less control on the thread life cycle and you end up with this big resource-cleanups blocks.

-> m b
withWalletWorker wai k = do
-- 'mDone' is full if the worker finished.
mDone :: MVar (Either Ex.SomeException ()) <- liftIO newEmptyMVar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically exactly what waitCatch gives you from the async package, at least seems like so, superficially. Why do you feel this MVar-based solution is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are right. I can't remember the details at this moment because I first implemented this many weeks ago, but if I recall correctly it had to do with Async mandating IO everywhere, which was preventing me from doing... something. I can try and reimplement my changes with Async now and see if that still holds. Maybe it doesn't anymore and Async will clean this a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out things are better with async. I just submitted those changes. Thanks for the suggestion.

Copy link
Contributor

@edsko edsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the discussion on this PR has drowned in discussions of technical details. Before we can address those, I would like to know: what is the goal here? What was wrong with the original code? Why did you feel that this was urgent to fix? What motivates the change from a single Chan to a multitude of concurrent variables, thereby requiring STM? Why did you need to roll your own thread management, rather than use async? At the moment this feels heavily over-engineered to me (that's not to say it's wrong, but without a proper problem statement, it's impossible evaluate). It's not even clear to me what the original problem was, or even under which ticket this falls. We need context and motivation; the original code was fairly straight-forward and simple, we'd need good reasons to change that. (I'm not talking about hoist versus something else; I'm talking broader picture here).

We don't have much time left and I want to make sure we spend it wisely.

@k0001
Copy link
Contributor Author

k0001 commented Aug 1, 2018

@edsko The old walletWorker wasn't careful with logging. The old forkWalletWorker could leave a dangling worker running and wasn't careful with ending the wallet worker (moreover, it expects the caller to send Shutdown, which puts the burden on the wrong place. I'd argue that Shutdown shouldn't exist at all), nor clear about the expectations when using invoke out of its intended scope, nor the behavior in case of exceptions. Would you care to take a look at the code again? I did improve it a bit since submitting it yesterday.

About how “urgent” this is: That's up for debate, but we did agree that submitting a PR for this code was a good idea, so here it is.

@k0001
Copy link
Contributor Author

k0001 commented Aug 1, 2018

Another though: Chans (and TQueues, TChans and similar beasts) are bidirectional, so making them arguments to top-level functions is not particularly a good idea since one can't rely on the types to reason about how a function will use them.

Copy link
Contributor

@edsko edsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 I would like to request

  1. Please identify specific, concrete, scenarios that may go wrong with the existing worker, so that we know what we're talking about.
  2. What is the minimal solution to those problems? In particular, might there be a solution that doesn't require STM (in other words, sticks with a single Chan, since I totally agree with you that dealing with multiple MVars indeed is a big red flag), and ideally also without all of this exception masking stuff.
  3. As regards to the whole hoist issue -- I don't have strong feelings about this, although if Matt is right (I haven't looked at the details) it seems strange to talk about a functor on the category of monads when the things that we're constructing aren't actually monads -- but there is a much more pragmatic but nonetheless much more important concern, and that is that we badly need to bring the number of dependencies that we have down.

@k0001
Copy link
Contributor Author

k0001 commented Aug 1, 2018

Please identify specific, concrete, scenarios that may go wrong with the existing worker, so that we know what we're talking about.

See here and here.

What is the minimal solution to those problems? In particular, might there be a solution that doesn't require STM (in other words, sticks with a single Chan, since I totally agree with you that dealing with multiple MVars indeed is a big red flag), and ideally also without all of this exception masking stuff.

Can you elaborate why you'd like to avoid STM? The current needs seem to boil down to having some kind of channel, and some way to mark this channel as closed. We have a TQueue and a TVar for this. Yes, I think at this point a MVar and a Chan could be made to work as optimizations, but I wouldn't dare jump right into that without us first agreeing that any of this effort is valuable.

... hoist ...

We are discussing that in the work chat :)

@edsko
Copy link
Contributor

edsko commented Aug 1, 2018

Had a long chat with @k0001 about this. I am basically OK with this the idea behind this PR now; we agreed that @k0001 will refactor the code a bit to clarify its intent and document here the two problems that he's trying to solve, and then we can do a final pass over it on Friday and merge it.

@edsko edsko added the wip label Aug 1, 2018
@k0001 k0001 force-pushed the rc-cbr-243-wwo branch 2 times, most recently from 4a55f2d to 41b4308 Compare August 2, 2018 14:52
@adinapoli-iohk
Copy link
Contributor

adinapoli-iohk commented Aug 3, 2018

Much better, thanks @k0001 !

As you need to rebase anyway, can I ask you to amend some small styling things I will point with inline comments? thanks 😉

Ps. You might want to remove the wip label now.

, forkWalletWorker
, walletWorker
, withWalletWorker
, Err_WalletWorkerExpired(..)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 We don't use this style for type names -- we don't use snake casing but standard camel case -- let's rename this WalletWorkerExpiredError.

tick = lift (readChan chan) >>= \case
Shutdown -> return ()
msg -> interp ops msg >> tick
-- | Connect a wallet action interpreter to a source actions. This function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 Maybe "to a source of actions" sounds better?

Nothing -> pure ()
Just wa -> interp wai wa >> next)
initialWorkerState)
(emit wai "Stoping wallet worker.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 Typo, should be "stopping".

-- source will always be prioritized.
--
-- Usage of the obtained 'STM' action after the given continuation has returned
-- is not possible. It will throw 'Err_WalletWorkerExpired'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 Let's change the type to WalletWorkerExpiredError here as well.

@@ -59,7 +59,8 @@ instance Exception CreateWalletError
-------------------------------------------------------------------------------}

-- | Creates a new HD 'Wallet'.
createHdWallet :: PassiveWallet
createHdWallet
:: PassiveWallet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0001 As a rule of thumb, let's not change other functions only for style purposes unsolicitedly, please 😉

@edsko edsko removed the wip label Aug 3, 2018
Copy link
Contributor

@edsko edsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Dunno how severe the story with dependencies is, but we can revisit this at another time. Please make sure to rebase, handle conflcits, please hlint, and generally make CI go green, and then we can merge.

@k0001 k0001 force-pushed the rc-cbr-243-wwo branch 2 times, most recently from 002bc1b to a8c269c Compare August 3, 2018 12:13
@adinapoli-iohk adinapoli-iohk merged commit 8b3a33d into develop Aug 3, 2018
@adinapoli-iohk adinapoli-iohk deleted the rc-cbr-243-wwo branch August 3, 2018 15:33
@adinapoli-iohk
Copy link
Contributor

🎉 Thanks @k0001 !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants