-
Notifications
You must be signed in to change notification settings - Fork 406
Move forward_htlcs
into standalone lock
#1656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move forward_htlcs
into standalone lock
#1656
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely looks good to me.
@TheBlueMatt I noticed a few undocumented lock order dependencies, such as pending_outbound_payments
/ pending_events
and peer_state
/ outbound_scid_aliases
. Not sure how much these matter but could be an opportunity to add docs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
lightning/src/ln/channelmanager.rs
Outdated
@@ -3095,7 +3095,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
let mut channel_state_lock = self.channel_state.lock().unwrap(); | |||
let channel_state = &mut *channel_state_lock; | |||
|
|||
for (short_chan_id, mut pending_forwards) in channel_state.forward_htlcs.drain() { | |||
for (short_chan_id, mut pending_forwards) in self.forward_htlcs.lock().unwrap().drain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually not hold the forwarding lock the whole time here? I think we can actually just take the forward_htlcs set out of vec (using mem::swap
) and walk the list with only the PersistenceNotifierGuard
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Fixed with 32e3a3c
Thank you very much for the review @valentinewallace & @TheBlueMatt! The lock order for many of the fields of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the added docs improvements! 🫡
Addressed the latest feedback, and also extended the lock order docs for the |
@@ -4140,7 +4173,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec(); | |||
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array( | |||
self.best_block.read().unwrap().height())); | |||
self.fail_htlc_backwards_internal(channel_state.take().unwrap(), | |||
self.fail_htlc_backwards_internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You drop the actual lock-fetching here, but a few lines up we still take the lock as if we needed it, and, more generally, we're still holding the channel_state lock, but don't need to. I thin you can restructure here a bit to avoid holding the lock going into fail_htlc_backwards_internal at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed this with 383ef94. Not sure if I overdid it, given that we'll have to takd the unique peer´s lock with #1507, which kind of removes the purpose of my refactor, instead of just taking the channel_state
lock several times. I pushed this in a separate commit just in case you think I overdid this, so I can revert it easily.
@@ -3751,7 +3789,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec(); | |||
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array( | |||
self.best_block.read().unwrap().height())); | |||
self.fail_htlc_backwards_internal(channel_state.take().unwrap(), | |||
self.fail_htlc_backwards_internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with c709bd4
lightning/src/ln/channelmanager.rs
Outdated
/// Note that no consistency guarantees are made about the existence of a channel with the | ||
/// `short_channel_id` here, nor the `short_channel_id` in the `PendingHTLCInfo`! | ||
/// | ||
/// Locked *after* `channel_state`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid holding both at the same time, even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, if but that would require that we take the channel_state
lock temporarily before taking the forward_htlcs
lock in ChannelManager::write
function, drop it, and then take it right after again for ChannelHolder::claimable_htlcs
map (unless we do some mem::swap
ing). That seems kind of dirty, so I'd suggest we stick to this lock_order
until we'd moved claimable_htlcs
or the channels themselves as well :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that's the only place where the lock order arises between channel_state
and forward_htlcs
, I'd just throw scopes around the first channel_state
and forward_htlcs
usages so we can keep them not dependent on each other in terms of ordering (similar to the best_block
locking at the top). Don't feel strongly though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
4af18a6
@@ -3095,7 +3130,10 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
let mut channel_state_lock = self.channel_state.lock().unwrap(); | |||
let channel_state = &mut *channel_state_lock; | |||
|
|||
for (short_chan_id, mut pending_forwards) in channel_state.forward_htlcs.drain() { | |||
let mut forward_htlcs = HashMap::new(); | |||
mem::swap(&mut forward_htlcs, &mut self.forward_htlcs.lock().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you holding the channel_state lock going into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @TheBlueMatt! You're right, it makes a lot of sense to do that cleanup here :). Addressed the feedback with the latest commits. |
lightning/src/ln/channelmanager.rs
Outdated
/// Locked *after* channel_state. | ||
/// | ||
/// If any of these locks are held at the same time, they must be acquired prior and in the | ||
/// following order: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than documenting lockorder in several places, let's define one global lock order in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
2c5a962
lightning/src/ln/channelmanager.rs
Outdated
/// `short_channel_id` here, nor the `short_channel_id` in the `PendingHTLCInfo`! | ||
/// | ||
/// Locked *after* `channel_state`. | ||
#[cfg(any(test, feature = "_test_utils"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make this public for _test_utils. I dont see it used outside of the lightning crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
425fcc6
fn fail_htlc_backwards_internal(&self, mut channel_state_lock: MutexGuard<ChannelHolder<Signer>>, source: HTLCSource, payment_hash: &PaymentHash, onion_error: HTLCFailReason, destination: HTLCDestination) { | ||
/// Note that while this function pushes events as well as FailHTLC's to fail htlcs for | ||
/// designated channels, no assumptions are made that the channels are still available. | ||
fn fail_htlc_backwards_internal(&self, source: HTLCSource, payment_hash: &PaymentHash, onion_error: HTLCFailReason,destination: HTLCDestination) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you actually add a #[cfg(debug_assertions)]
take-and-drop-channel_state-lock here? I think we want a "you're not allowed to be holding the channel_state lock when calling this` invariant, which isn't required for correctness, but may avoid accidentally adding one later. Should include a comment that describes that we want to avoid holding both locks, even though it doesn't strictly matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
0ceb5be
@@ -4078,8 +4111,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
|
|||
let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier); | |||
|
|||
let mut channel_state = Some(self.channel_state.lock().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #1656, we need to hold the channel_state lock for the entire duration of a claim - we want to make sure that all the channels are available first, and then inside the same lock do the claims, ensuring channels cant go away while we're claiming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm sorry if I'm misunderstanding this feedback, but I think the current code does handle the claim under one lock or am I missing something? Unless we're not ok with only taking the channel_state
lock temporarily when removing the htlc
from claimable_htlcs
, and taking it later when handling the claim (but that should be ok from my understanding as claimable_htlcs
have no consistency guarantees), and we're also ok with not holding the channel_state
lock during fail_htlc_backwards_internal
.
Please let me know what I'm missing :)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why it's not okay to take the channel_state
lock after removing the source too.
It does kinda seem like the diff could be smaller in claim_funds
though. If we need to revert this line to take the channel_state again, I think the diff can roughly:
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index f52a0414..f2837d5c 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4156,11 +4156,11 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut claimed_any_htlcs = false;
for htlc in sources.drain(..) {
if !valid_mpp {
- if channel_state.is_none() { channel_state = Some(self.channel_state.lock().unwrap()); }
+ channel_state.take();
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
self.best_block.read().unwrap().height()));
- self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
+ self.fail_htlc_backwards_internal(
HTLCSource::PreviousHopData(htlc.prev_hop), &payment_hash,
HTLCFailReason::Reason { failure_code: 0x4000|15, data: htlc_msat_height_data },
HTLCDestination::FailedPayment { payment_hash } );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, the patch is just rendering really funny here. I think its fine as-is, though will take a second look once Val's feedback is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @valentinewallace and thanks for the clarification @TheBlueMatt. I kept the patch as is for now, but I'll of course change if you'd prefer :)
383ef94
to
2c5a962
Compare
Addressed the latest feedback, except the "channel_state lock for the entire duration of a claim" feedback above. @TheBlueMatt, please check my comment regarding that and let me know what I'm missing :)! Also placed moved the fixup commits to be in correct order (under the commit they are supposed to be fixed up into). |
@@ -720,10 +723,21 @@ pub struct ChannelManager<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, | |||
/// after reloading from disk while replaying blocks against ChannelMonitors. | |||
/// | |||
/// See `PendingOutboundPayment` documentation for more info. | |||
/// | |||
/// Locked *after* channel_state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replace these comments with "See struct-level documentation for lock order requirements" and below/above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
da396f4
lightning/src/ln/channelmanager.rs
Outdated
per_peer_state: RwLock<HashMap<PublicKey, Mutex<PeerState>>>, | ||
|
||
pending_events: Mutex<Vec<events::Event>>, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
da396f4
lightning/src/ln/channelmanager.rs
Outdated
/// Note that while this function pushes events as well as FailHTLC's to fail htlcs for | ||
/// designated channels, no assumptions are made that the channels are still available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "Note that we do not assume that channels corresponding to failed HTLCs are still available."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with:
3c546e7
@@ -4078,8 +4111,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
|
|||
let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier); | |||
|
|||
let mut channel_state = Some(self.channel_state.lock().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why it's not okay to take the channel_state
lock after removing the source too.
It does kinda seem like the diff could be smaller in claim_funds
though. If we need to revert this line to take the channel_state again, I think the diff can roughly:
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index f52a0414..f2837d5c 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4156,11 +4156,11 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut claimed_any_htlcs = false;
for htlc in sources.drain(..) {
if !valid_mpp {
- if channel_state.is_none() { channel_state = Some(self.channel_state.lock().unwrap()); }
+ channel_state.take();
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
self.best_block.read().unwrap().height()));
- self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
+ self.fail_htlc_backwards_internal(
HTLCSource::PreviousHopData(htlc.prev_hop), &payment_hash,
HTLCFailReason::Reason { failure_code: 0x4000|15, data: htlc_msat_height_data },
HTLCDestination::FailedPayment { payment_hash } );
2c5a962
to
da396f4
Compare
Codecov ReportBase: 90.77% // Head: 90.71% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1656 +/- ##
==========================================
- Coverage 90.77% 90.71% -0.07%
==========================================
Files 86 86
Lines 46631 46684 +53
Branches 46631 46684 +53
==========================================
+ Hits 42331 42350 +19
- Misses 4300 4334 +34
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Addressed the latest feedback with last push, so that you can see the changes easily. Sadly this needs rebase now, so I'll fix the rebase and push that now in a separate push. |
da396f4
to
9768cad
Compare
Rebased on upstream! No changes made except fixing the conflict. |
Snap, noticed that I missed that the The previous lock order comments became really complicated because of that, with all different lock order branches, so I opted to update the docs to instead be illustrated in a tree-like way similar to how file directories are often illustrated. I'm open to other suggestions if you think that's a bad idea though :)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I'm ACK after this!
lightning/src/ln/channelmanager.rs
Outdated
log_error!(self.logger, "Somehow ended up with an MPP spontaneous payment - this should not be reachable!"); | ||
let mut errs = Vec::new(); | ||
let mut claimed_any_htlcs = false; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get rid of this scope and mem::drop
the channel_state
lock instead (or maybe make it back into an Option if that's easier)? My preference is to avoid adding an extra level of indentation here since it's already pretty nested 😬. Otherwise, this patch looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
81bf4ac
// `ChannelManager`. If two or more locks on the same branch is held at the same time, locks should | ||
// be taken in the order of the lowest to highest level in the tree. | ||
// | ||
// Lock order tree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool tree graph 😍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, awesome to hear :)!
lightning/src/ln/channelmanager.rs
Outdated
// │ │ | ||
// │ └──`per_peer_state` | ||
// │ │ | ||
// │ │──`best_block` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this order may be violated in send_payment_along_path
, IIUC best_block
is acquired after pending_outbound_payments
. It's pre-existing so maybe it's fine to just remove this from the tree for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes you're absolutely correct.. Thanks for the correction, and sorry for the errors with the lock orders! I confusingly thought yesterday that only write locks created lock orders, and not read locks as well.
I've updated the tree with best_block
after pending_outbound_payments
, since I've confirmed that it's the case. Also added the outbound_scid_aliases
lock after the per_peer_state
in the tree, after confirming that lock order. Finally, I fixed some grammar mistakes + added the total_consistency_lock
and forward_htlcs
for the illustration to be more correct. aae7b4e
In case it's helpful, I've created a simple lock order test for the ChannelManager
that I think touches most of the lock orders. I'm yet to confirm that that's the case, so it would need more work if we'd want something similar:
#[test]
fn lock_order_test() {
let chanmon_cfgs = create_chanmon_cfgs(3);
let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);
let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[None, None, None]);
let nodes = create_network(3, &node_cfgs, &node_chanmgrs);
let chan_1 = create_announced_chan_between_nodes(&nodes, 0, 1, channelmanager::provided_init_features(), channelmanager::provided_init_features());
let chan_2 = create_announced_chan_between_nodes(&nodes, 1, 2, channelmanager::provided_init_features(), channelmanager::provided_init_features());
send_payment(&nodes[0], &vec!(&nodes[1], &nodes[2])[..], 100000);
let _closed_channel_1 = close_channel(&nodes[0], &nodes[1], &chan_1.2, chan_1.3, true);
let _closed_channel_2 = close_channel(&nodes[1], &nodes[2], &chan_2.2, chan_2.3, true);
for node in vec![&nodes[0], &nodes[1], &nodes[2]] {
let _tc_lock = node.node.total_consistency_lock.write().unwrap();
let _forward_htlcs = node.node.forward_htlcs.lock().unwrap(); // Has no lock order
let _channel_state_lock = node.node.channel_state.lock().unwrap();
let _id_to_peer = node.node.id_to_peer.lock().unwrap();
//let _short_to_chan_info = node.node.short_to_chan_info.write().unwrap();
let _per_peer_state = node.node.per_peer_state.write().unwrap();
let _outbound_scid_aliases = node.node.outbound_scid_aliases.lock().unwrap();
let _pending_inbound_payments = node.node.pending_inbound_payments.lock().unwrap();
let _pending_outbound_payments = node.node.pending_outbound_payments.lock().unwrap();
let _best_block = node.node.best_block.write().unwrap();
let _pending_events = node.node.pending_events.lock().unwrap();
let _pending_background_events = node.node.pending_background_events.lock().unwrap();
}
nodes[0].node.get_and_clear_pending_events();
nodes[1].node.get_and_clear_pending_events();
nodes[2].node.get_and_clear_pending_events();
}
8263cf5
to
aae7b4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after squash and a second reviewer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to squash, one note about comment clarity and one thing to do in a followup but otherwise this is ready, I think.
lightning/src/ln/channelmanager.rs
Outdated
// Lock order tree: | ||
// | ||
// `total_consistency_lock` | ||
// │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, okay, so, like, we allow unicode in our source, cause people have names that need unicode, but, like, also unicode (and ASCII control chars) are scary, and quite some things (still, yes, really) don't show unicode correctly. So, like, in general, if its trivial to do so, we're probably better off avoiding unicode (and in this case I think that's truly trivial, and maybe also skip the extra lines that just have a | on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the unicode with ASCII with d1e43f3 .
Also removed the extra lines with: 5d5a813
I did that one in a specific fixup just so you can compare the difference to see if you think it affects the readability of the tree. I personally think removing the the extra lines makes the tree considerably less readable, but can also see the point in removing them. Let me know which version you prefer!
lightning/src/ln/channelmanager.rs
Outdated
// | ||
// Lock order: | ||
// The tree structure below illustrates the lock order requirements for the different locks of the | ||
// `ChannelManager`. If two or more locks on the same branch are held at the same time, locks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused what the text here is saying - are we allowed to take to locks that are on the same level in any order (which I guess really means we can't/won't take them at the same time ever?), and the ordering is just down a single branch? I wonder if we can make the "don't take these at the same time" checks compile-time (not that it matters, and would be a followup anyway if we do, but I worry about the docs here going stale really quickly, at least the "dont take at the same time" ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer compile-time checks too, couldn't find an obvious way though :/ Asked on the rust discord: https://discord.com/channels/442252698964721669/448238009733742612/1020827841564266606 which didn't really bear fruit and separately found this project: https://github.com/BurtonQin/lockbud which may be worth looking into 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mean we already have our own lock order inversion detection, it's not like building extra features into it is hard, but of course doesn't need to happen now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, agree that the docs we're very unclear in regards to taking locks across branches at the same time. I've attempted to make them more clear with:
0d43f16
Let me know if you still find them unclear.
Also, interesting find @valentinewallace. If you both think it's worth the time researching more about compile-time checks, It's of course something I could do :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no worries @ViktorTigerstrom, I forgot we had built the debug_sync::Mutex
which is literally designed for lock order inversion detection like Matt said
let (failure_code, onion_failure_data) = | ||
match channel_state.by_id.entry(channel_id) { | ||
match self.channel_state.lock().unwrap().by_id.entry(channel_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, as a followup we should inline fail_holding_cell_htlcs
(or something similar) so that we dont take a lock on every iteration just to call get_htlc_inbound_temp_fail_err_and_data
redundantly with the same parameters. I'll open a followup issue and link this.
As we are eventually removing the `channel_state` lock, this commit moves the `forward_htlcs` map out of the `channel_state` lock, to ease that process.
The `forward_htlc` was prior to this commit only held at the same time as the `channel_state` lock during the write process of the `ChannelManager`. This commit removes the lock order dependency, by taking the `channel_state`lock temporarily during the write process.
aae7b4e
to
0d43f16
Compare
Squashed and addressed the new feedback in new fixup commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'm fine with it with or without spacing, up to you and @valentinewallace, I just thought it was a bit nicer to be able to have it open without taking up space while reading code. Feel free to squash once you and @valentinewallace figure out what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after squash
0d43f16
to
fc12d43
Compare
Squashed! I dropped the commit that removed the empty rows now as I think the tree becomes less readable without them. But I'll of course remove them again if @valentinewallace also prefers the tree without the empty rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌟
No strong feelings on the spacing, I guess I slightly prefer more concise too but fine as is
@@ -3953,8 +4006,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
|
|||
let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier); | |||
|
|||
let mut channel_state = Some(self.channel_state.lock().unwrap()); | |||
let removed_source = channel_state.as_mut().unwrap().claimable_htlcs.remove(&payment_hash); | |||
let removed_source = self.channel_state.lock().unwrap().claimable_htlcs.remove(&payment_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup, can you avoid taking this lock twice in a row? This patch applies fine and avoids taking-immediately-releasing-and-taking-again the channel_state lock,
$ git diff
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 72f1f4d59..1aaeede8d 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4006,7 +4006,8 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier);
- let removed_source = self.channel_state.lock().unwrap().claimable_htlcs.remove(&payment_hash);
+ let mut channel_state = self.channel_state.lock().unwrap();
+ let removed_source = channel_state.claimable_htlcs.remove(&payment_hash);
if let Some((payment_purpose, mut sources)) = removed_source {
assert!(!sources.is_empty());
@@ -4026,8 +4027,6 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut valid_mpp = true;
let mut errs = Vec::new();
let mut claimed_any_htlcs = false;
- let mut channel_state_lock = self.channel_state.lock().unwrap();
- let channel_state = &mut *channel_state_lock;
for htlc in sources.iter() {
if let None = channel_state.short_to_chan_info.get(&htlc.prev_hop.short_channel_id) {
valid_mpp = false;
@@ -4064,7 +4063,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
}
if valid_mpp {
for htlc in sources.drain(..) {
- match self.claim_funds_from_hop(&mut channel_state_lock, htlc.prev_hop, payment_preimage) {
+ match self.claim_funds_from_hop(&mut channel_state, htlc.prev_hop, payment_preimage) {
ClaimFundsFromHop::MonitorUpdateFail(pk, err, _) => {
if let msgs::ErrorAction::IgnoreError = err.err.action {
// We got a temporary failure updating monitor, but will claim the
@@ -4084,7 +4083,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
}
}
}
- mem::drop(channel_state_lock);
+ mem::drop(channel_state);
if !valid_mpp {
for htlc in sources.drain(..) {
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
Prerequisite for #1507.
This PR moves the
forward_htlcs
into standalone lock, to make it possible to change the lock order of thechannel_state
lock in #1507.