Skip to content

Move forward_htlcs into standalone lock #1656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ViktorTigerstrom
Copy link
Contributor

Prerequisite for #1507.

This PR moves the forward_htlcs into standalone lock, to make it possible to change the lock order of the channel_state lock in #1507.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looks good to me.

@TheBlueMatt I noticed a few undocumented lock order dependencies, such as pending_outbound_payments / pending_events and peer_state / outbound_scid_aliases. Not sure how much these matter but could be an opportunity to add docs here

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -3095,7 +3095,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut channel_state_lock = self.channel_state.lock().unwrap();
let channel_state = &mut *channel_state_lock;

for (short_chan_id, mut pending_forwards) in channel_state.forward_htlcs.drain() {
for (short_chan_id, mut pending_forwards) in self.forward_htlcs.lock().unwrap().drain() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually not hold the forwarding lock the whole time here? I think we can actually just take the forward_htlcs set out of vec (using mem::swap) and walk the list with only the PersistenceNotifierGuard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Fixed with 32e3a3c

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Aug 15, 2022

Thank you very much for the review @valentinewallace & @TheBlueMatt!
I've addressed the feedback with a fixup commit, as well as a separate commit for updating the lock order docs.
I also changed the public accessibility of that forward_htlcs field in a fixup commit.

The lock order for many of the fields of the ChannelManager, is implied through the writer function for the serialization of the ChannelManager currently, but I figured that it's probably worth detailing all those fields, despite only being implied through the writer function. Let me know if you think differently :) (Or if you think it would be better to just hold the locks temporarily in the writer func). Also let me know if you think the docs commit fits better in a different PR.

@valentinewallace valentinewallace self-requested a review August 15, 2022 22:06
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added docs improvements! 🫡

@ViktorTigerstrom
Copy link
Contributor Author

Addressed the latest feedback, and also extended the lock order docs for the id_to_peer map.

@@ -4140,7 +4173,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
self.best_block.read().unwrap().height()));
self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
self.fail_htlc_backwards_internal(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You drop the actual lock-fetching here, but a few lines up we still take the lock as if we needed it, and, more generally, we're still holding the channel_state lock, but don't need to. I thin you can restructure here a bit to avoid holding the lock going into fail_htlc_backwards_internal at all.

Copy link
Contributor Author

@ViktorTigerstrom ViktorTigerstrom Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed this with 383ef94. Not sure if I overdid it, given that we'll have to takd the unique peer´s lock with #1507, which kind of removes the purpose of my refactor, instead of just taking the channel_state lock several times. I pushed this in a separate commit just in case you think I overdid this, so I can revert it easily.

@@ -3751,7 +3789,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
self.best_block.read().unwrap().height()));
self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
self.fail_htlc_backwards_internal(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, see below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with c709bd4

/// Note that no consistency guarantees are made about the existence of a channel with the
/// `short_channel_id` here, nor the `short_channel_id` in the `PendingHTLCInfo`!
///
/// Locked *after* `channel_state`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid holding both at the same time, even.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, if but that would require that we take the channel_state lock temporarily before taking the forward_htlcs lock in ChannelManager::write function, drop it, and then take it right after again for ChannelHolder::claimable_htlcs map (unless we do some mem::swaping). That seems kind of dirty, so I'd suggest we stick to this lock_order until we'd moved claimable_htlcs or the channels themselves as well :)?

Copy link
Contributor

@valentinewallace valentinewallace Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that's the only place where the lock order arises between channel_state and forward_htlcs, I'd just throw scopes around the first channel_state and forward_htlcs usages so we can keep them not dependent on each other in terms of ordering (similar to the best_block locking at the top). Don't feel strongly though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
4af18a6

@@ -3095,7 +3130,10 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
let mut channel_state_lock = self.channel_state.lock().unwrap();
let channel_state = &mut *channel_state_lock;

for (short_chan_id, mut pending_forwards) in channel_state.forward_htlcs.drain() {
let mut forward_htlcs = HashMap::new();
mem::swap(&mut forward_htlcs, &mut self.forward_htlcs.lock().unwrap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you holding the channel_state lock going into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with c709bd4. Let me know if you think it's overkill to acquire the lock in the beginning of the forward_htlcs loop until #1639 and #1507

@ViktorTigerstrom
Copy link
Contributor Author

Thanks for the review @TheBlueMatt! You're right, it makes a lot of sense to do that cleanup here :). Addressed the feedback with the latest commits.

/// Locked *after* channel_state.
///
/// If any of these locks are held at the same time, they must be acquired prior and in the
/// following order:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than documenting lockorder in several places, let's define one global lock order in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
2c5a962

/// `short_channel_id` here, nor the `short_channel_id` in the `PendingHTLCInfo`!
///
/// Locked *after* `channel_state`.
#[cfg(any(test, feature = "_test_utils"))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this public for _test_utils. I dont see it used outside of the lightning crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
425fcc6

fn fail_htlc_backwards_internal(&self, mut channel_state_lock: MutexGuard<ChannelHolder<Signer>>, source: HTLCSource, payment_hash: &PaymentHash, onion_error: HTLCFailReason, destination: HTLCDestination) {
/// Note that while this function pushes events as well as FailHTLC's to fail htlcs for
/// designated channels, no assumptions are made that the channels are still available.
fn fail_htlc_backwards_internal(&self, source: HTLCSource, payment_hash: &PaymentHash, onion_error: HTLCFailReason,destination: HTLCDestination) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you actually add a #[cfg(debug_assertions)] take-and-drop-channel_state-lock here? I think we want a "you're not allowed to be holding the channel_state lock when calling this` invariant, which isn't required for correctness, but may avoid accidentally adding one later. Should include a comment that describes that we want to avoid holding both locks, even though it doesn't strictly matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
0ceb5be

@@ -4078,8 +4111,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana

let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier);

let mut channel_state = Some(self.channel_state.lock().unwrap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #1656, we need to hold the channel_state lock for the entire duration of a claim - we want to make sure that all the channels are available first, and then inside the same lock do the claims, ensuring channels cant go away while we're claiming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm sorry if I'm misunderstanding this feedback, but I think the current code does handle the claim under one lock or am I missing something? Unless we're not ok with only taking the channel_state lock temporarily when removing the htlc from claimable_htlcs, and taking it later when handling the claim (but that should be ok from my understanding as claimable_htlcs have no consistency guarantees), and we're also ok with not holding the channel_state lock during fail_htlc_backwards_internal.

Please let me know what I'm missing :)!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why it's not okay to take the channel_state lock after removing the source too.

It does kinda seem like the diff could be smaller in claim_funds though. If we need to revert this line to take the channel_state again, I think the diff can roughly:

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index f52a0414..f2837d5c 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4156,11 +4156,11 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 			let mut claimed_any_htlcs = false;
 			for htlc in sources.drain(..) {
 				if !valid_mpp {
-					if channel_state.is_none() { channel_state = Some(self.channel_state.lock().unwrap()); }
+					channel_state.take();
 					let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
 					htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
 							self.best_block.read().unwrap().height()));
-					self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
+					self.fail_htlc_backwards_internal(
 									 HTLCSource::PreviousHopData(htlc.prev_hop), &payment_hash,
 									 HTLCFailReason::Reason { failure_code: 0x4000|15, data: htlc_msat_height_data },
 									 HTLCDestination::FailedPayment { payment_hash } );

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, the patch is just rendering really funny here. I think its fine as-is, though will take a second look once Val's feedback is addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @valentinewallace and thanks for the clarification @TheBlueMatt. I kept the patch as is for now, but I'll of course change if you'd prefer :)

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-08-forward-htlcs-as-standalone-lock branch from 383ef94 to 2c5a962 Compare September 7, 2022 18:43
@ViktorTigerstrom
Copy link
Contributor Author

Addressed the latest feedback, except the "channel_state lock for the entire duration of a claim" feedback above. @TheBlueMatt, please check my comment regarding that and let me know what I'm missing :)!

Also placed moved the fixup commits to be in correct order (under the commit they are supposed to be fixed up into).

@@ -720,10 +723,21 @@ pub struct ChannelManager<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref,
/// after reloading from disk while replaying blocks against ChannelMonitors.
///
/// See `PendingOutboundPayment` documentation for more info.
///
/// Locked *after* channel_state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace these comments with "See struct-level documentation for lock order requirements" and below/above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
da396f4

per_peer_state: RwLock<HashMap<PublicKey, Mutex<PeerState>>>,

pending_events: Mutex<Vec<events::Event>>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
da396f4

Comment on lines 3891 to 3892
/// Note that while this function pushes events as well as FailHTLC's to fail htlcs for
/// designated channels, no assumptions are made that the channels are still available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Note that we do not assume that channels corresponding to failed HTLCs are still available."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with:
3c546e7

@@ -4078,8 +4111,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana

let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier);

let mut channel_state = Some(self.channel_state.lock().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why it's not okay to take the channel_state lock after removing the source too.

It does kinda seem like the diff could be smaller in claim_funds though. If we need to revert this line to take the channel_state again, I think the diff can roughly:

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index f52a0414..f2837d5c 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4156,11 +4156,11 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 			let mut claimed_any_htlcs = false;
 			for htlc in sources.drain(..) {
 				if !valid_mpp {
-					if channel_state.is_none() { channel_state = Some(self.channel_state.lock().unwrap()); }
+					channel_state.take();
 					let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
 					htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(
 							self.best_block.read().unwrap().height()));
-					self.fail_htlc_backwards_internal(channel_state.take().unwrap(),
+					self.fail_htlc_backwards_internal(
 									 HTLCSource::PreviousHopData(htlc.prev_hop), &payment_hash,
 									 HTLCFailReason::Reason { failure_code: 0x4000|15, data: htlc_msat_height_data },
 									 HTLCDestination::FailedPayment { payment_hash } );

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-08-forward-htlcs-as-standalone-lock branch from 2c5a962 to da396f4 Compare September 15, 2022 20:54
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2022

Codecov Report

Base: 90.77% // Head: 90.71% // Decreases project coverage by -0.06% ⚠️

Coverage data is based on head (0d43f16) compared to base (8886d1d).
Patch coverage: 84.21% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1656      +/-   ##
==========================================
- Coverage   90.77%   90.71%   -0.07%     
==========================================
  Files          86       86              
  Lines       46631    46684      +53     
  Branches    46631    46684      +53     
==========================================
+ Hits        42331    42350      +19     
- Misses       4300     4334      +34     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 85.09% <82.85%> (+0.03%) ⬆️
lightning/src/ln/onion_route_tests.rs 97.68% <100.00%> (ø)
lightning-invoice/src/utils.rs 94.85% <0.00%> (-1.92%) ⬇️
lightning-invoice/src/payment.rs 90.11% <0.00%> (-0.76%) ⬇️
lightning-net-tokio/src/lib.rs 76.73% <0.00%> (-0.31%) ⬇️
lightning/src/ln/monitor_tests.rs 99.44% <0.00%> (-0.12%) ⬇️
lightning/src/ln/functional_tests.rs 96.94% <0.00%> (-0.08%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ViktorTigerstrom
Copy link
Contributor Author

Addressed the latest feedback with last push, so that you can see the changes easily. Sadly this needs rebase now, so I'll fix the rebase and push that now in a separate push.

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-08-forward-htlcs-as-standalone-lock branch from da396f4 to 9768cad Compare September 15, 2022 21:12
@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Sep 15, 2022

Rebased on upstream! No changes made except fixing the conflict.

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Sep 15, 2022

Snap, noticed that I missed that the best_block lock has a lock order after the per_peer_state lock.

The previous lock order comments became really complicated because of that, with all different lock order branches, so I opted to update the docs to instead be illustrated in a tree-like way similar to how file directories are often illustrated. I'm open to other suggestions if you think that's a bad idea though :)!

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I'm ACK after this!

log_error!(self.logger, "Somehow ended up with an MPP spontaneous payment - this should not be reachable!");
let mut errs = Vec::new();
let mut claimed_any_htlcs = false;
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get rid of this scope and mem::drop the channel_state lock instead (or maybe make it back into an Option if that's easier)? My preference is to avoid adding an extra level of indentation here since it's already pretty nested 😬. Otherwise, this patch looks good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!
81bf4ac

// `ChannelManager`. If two or more locks on the same branch is held at the same time, locks should
// be taken in the order of the lowest to highest level in the tree.
//
// Lock order tree:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool tree graph 😍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, awesome to hear :)!

// │ │
// │ └──`per_peer_state`
// │ │
// │ │──`best_block`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this order may be violated in send_payment_along_path, IIUC best_block is acquired after pending_outbound_payments. It's pre-existing so maybe it's fine to just remove this from the tree for now?

Copy link
Contributor Author

@ViktorTigerstrom ViktorTigerstrom Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes you're absolutely correct.. Thanks for the correction, and sorry for the errors with the lock orders! I confusingly thought yesterday that only write locks created lock orders, and not read locks as well.

I've updated the tree with best_block after pending_outbound_payments, since I've confirmed that it's the case. Also added the outbound_scid_aliases lock after the per_peer_state in the tree, after confirming that lock order. Finally, I fixed some grammar mistakes + added the total_consistency_lock and forward_htlcs for the illustration to be more correct. aae7b4e

In case it's helpful, I've created a simple lock order test for the ChannelManager that I think touches most of the lock orders. I'm yet to confirm that that's the case, so it would need more work if we'd want something similar:

#[test]
fn lock_order_test() {
	let chanmon_cfgs = create_chanmon_cfgs(3);
	let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);
	let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[None, None, None]);
	let nodes = create_network(3, &node_cfgs, &node_chanmgrs);
	let chan_1 = create_announced_chan_between_nodes(&nodes, 0, 1, channelmanager::provided_init_features(), channelmanager::provided_init_features());
	let chan_2 = create_announced_chan_between_nodes(&nodes, 1, 2, channelmanager::provided_init_features(), channelmanager::provided_init_features());
	send_payment(&nodes[0], &vec!(&nodes[1], &nodes[2])[..], 100000);
	let _closed_channel_1 = close_channel(&nodes[0], &nodes[1], &chan_1.2, chan_1.3, true);
	let _closed_channel_2 = close_channel(&nodes[1], &nodes[2], &chan_2.2, chan_2.3, true);
	for node in vec![&nodes[0], &nodes[1], &nodes[2]] {
		let _tc_lock = node.node.total_consistency_lock.write().unwrap();
		let _forward_htlcs = node.node.forward_htlcs.lock().unwrap(); // Has no lock order
		let _channel_state_lock = node.node.channel_state.lock().unwrap();
		let _id_to_peer = node.node.id_to_peer.lock().unwrap();
		//let _short_to_chan_info = node.node.short_to_chan_info.write().unwrap();
		let _per_peer_state = node.node.per_peer_state.write().unwrap();
		let _outbound_scid_aliases = node.node.outbound_scid_aliases.lock().unwrap();
		let _pending_inbound_payments = node.node.pending_inbound_payments.lock().unwrap();
		let _pending_outbound_payments = node.node.pending_outbound_payments.lock().unwrap();
		let _best_block = node.node.best_block.write().unwrap();
		let _pending_events = node.node.pending_events.lock().unwrap();
		let _pending_background_events = node.node.pending_background_events.lock().unwrap();
	}
	nodes[0].node.get_and_clear_pending_events();
	nodes[1].node.get_and_clear_pending_events();
	nodes[2].node.get_and_clear_pending_events();
}

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-08-forward-htlcs-as-standalone-lock branch from 8263cf5 to aae7b4e Compare September 16, 2022 17:50
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash and a second reviewer

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash, one note about comment clarity and one thing to do in a followup but otherwise this is ready, I think.

// Lock order tree:
//
// `total_consistency_lock`
// │
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, okay, so, like, we allow unicode in our source, cause people have names that need unicode, but, like, also unicode (and ASCII control chars) are scary, and quite some things (still, yes, really) don't show unicode correctly. So, like, in general, if its trivial to do so, we're probably better off avoiding unicode (and in this case I think that's truly trivial, and maybe also skip the extra lines that just have a | on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the unicode with ASCII with d1e43f3 .

Also removed the extra lines with: 5d5a813
I did that one in a specific fixup just so you can compare the difference to see if you think it affects the readability of the tree. I personally think removing the the extra lines makes the tree considerably less readable, but can also see the point in removing them. Let me know which version you prefer!

//
// Lock order:
// The tree structure below illustrates the lock order requirements for the different locks of the
// `ChannelManager`. If two or more locks on the same branch are held at the same time, locks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused what the text here is saying - are we allowed to take to locks that are on the same level in any order (which I guess really means we can't/won't take them at the same time ever?), and the ordering is just down a single branch? I wonder if we can make the "don't take these at the same time" checks compile-time (not that it matters, and would be a followup anyway if we do, but I worry about the docs here going stale really quickly, at least the "dont take at the same time" ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer compile-time checks too, couldn't find an obvious way though :/ Asked on the rust discord: https://discord.com/channels/442252698964721669/448238009733742612/1020827841564266606 which didn't really bear fruit and separately found this project: https://github.com/BurtonQin/lockbud which may be worth looking into 🤷‍♀️

Copy link
Collaborator

@TheBlueMatt TheBlueMatt Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I mean we already have our own lock order inversion detection, it's not like building extra features into it is hard, but of course doesn't need to happen now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, agree that the docs we're very unclear in regards to taking locks across branches at the same time. I've attempted to make them more clear with:
0d43f16
Let me know if you still find them unclear.

Also, interesting find @valentinewallace. If you both think it's worth the time researching more about compile-time checks, It's of course something I could do :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no worries @ViktorTigerstrom, I forgot we had built the debug_sync::Mutex which is literally designed for lock order inversion detection like Matt said

let (failure_code, onion_failure_data) =
match channel_state.by_id.entry(channel_id) {
match self.channel_state.lock().unwrap().by_id.entry(channel_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, as a followup we should inline fail_holding_cell_htlcs (or something similar) so that we dont take a lock on every iteration just to call get_htlc_inbound_temp_fail_err_and_data redundantly with the same parameters. I'll open a followup issue and link this.

As we are eventually removing the `channel_state` lock, this commit
moves the `forward_htlcs` map out of the `channel_state` lock, to ease
that process.
The `forward_htlc` was prior to this commit only held at the same time
as the `channel_state` lock during the write process of the
`ChannelManager`. This commit removes the lock order dependency, by
taking the `channel_state`lock temporarily during the write process.
@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-08-forward-htlcs-as-standalone-lock branch from aae7b4e to 0d43f16 Compare September 18, 2022 21:48
@ViktorTigerstrom
Copy link
Contributor Author

Squashed and addressed the new feedback in new fixup commits.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm fine with it with or without spacing, up to you and @valentinewallace, I just thought it was a bit nicer to be able to have it open without taking up space while reading code. Feel free to squash once you and @valentinewallace figure out what you want.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash

@ViktorTigerstrom
Copy link
Contributor Author

Squashed! I dropped the commit that removed the empty rows now as I think the tree becomes less readable without them. But I'll of course remove them again if @valentinewallace also prefers the tree without the empty rows.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌟

No strong feelings on the spacing, I guess I slightly prefer more concise too but fine as is

@@ -3953,8 +4006,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana

let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier);

let mut channel_state = Some(self.channel_state.lock().unwrap());
let removed_source = channel_state.as_mut().unwrap().claimable_htlcs.remove(&payment_hash);
let removed_source = self.channel_state.lock().unwrap().claimable_htlcs.remove(&payment_hash);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup, can you avoid taking this lock twice in a row? This patch applies fine and avoids taking-immediately-releasing-and-taking-again the channel_state lock,

$ git diff
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 72f1f4d59..1aaeede8d 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -4006,7 +4006,8 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 
                let _persistence_guard = PersistenceNotifierGuard::notify_on_drop(&self.total_consistency_lock, &self.persistence_notifier);
 
-               let removed_source = self.channel_state.lock().unwrap().claimable_htlcs.remove(&payment_hash);
+               let mut channel_state = self.channel_state.lock().unwrap();
+               let removed_source = channel_state.claimable_htlcs.remove(&payment_hash);
                if let Some((payment_purpose, mut sources)) = removed_source {
                        assert!(!sources.is_empty());
 
@@ -4026,8 +4027,6 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
                        let mut valid_mpp = true;
                        let mut errs = Vec::new();
                        let mut claimed_any_htlcs = false;
-                       let mut channel_state_lock = self.channel_state.lock().unwrap();
-                       let channel_state = &mut *channel_state_lock;
                        for htlc in sources.iter() {
                                if let None = channel_state.short_to_chan_info.get(&htlc.prev_hop.short_channel_id) {
                                        valid_mpp = false;
@@ -4064,7 +4063,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
                        }
                        if valid_mpp {
                                for htlc in sources.drain(..) {
-                                       match self.claim_funds_from_hop(&mut channel_state_lock, htlc.prev_hop, payment_preimage) {
+                                       match self.claim_funds_from_hop(&mut channel_state, htlc.prev_hop, payment_preimage) {
                                                ClaimFundsFromHop::MonitorUpdateFail(pk, err, _) => {
                                                        if let msgs::ErrorAction::IgnoreError = err.err.action {
                                                                // We got a temporary failure updating monitor, but will claim the
@@ -4084,7 +4083,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
                                        }
                                }
                        }
-                       mem::drop(channel_state_lock);
+                       mem::drop(channel_state);
                        if !valid_mpp {
                                for htlc in sources.drain(..) {
                                        let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants