Reduce common allocations across the codebase #2708

TheBlueMatt · 2023-11-04T23:15:11Z

My node has been experiencing more and more memory fragmentation lately, and while it seems the majority of that is #2706 and #2707, there's still plenty of room for misc improvements all over the place. With this and fixes for the other two issues we should be in a pretty good place, with allocations dominated by farrrr by block deserialization when syncing.

There's two commits here that could be performance regressions:

Pre-allocate the full require Vec prior to serializing into vecs which runs through our serialization logic twice in many cases before writing. I played around with a lower_bound Writeable method to optimize out some cases of having to run through the logic, but it doesn't really help in ChannelManager and ChannelMonitor or other deeply-nested structs because we're calling write there which hits our LengthCalculatingWriter instead of being able to use an optimized version. We could totally restructure the API to have Writeables call a magic method on the Writer which can short-circuit the write, but that's a lot of indirection and I'm lazy.
Avoid allocating when checking gossip message signatures probably isn't a huge regression, cause hashers are buffered, in essence, anyway, but I didn't check.

When we're reading a `NetworkGraph`, we know how many nodes/channels we are reading, there's no reason not to pre-allocate the `IndexedMap`'s inner `HashMap` and `Vec`, which we do here. This seems to reduce on-startup heap fragmentation with glibc by something like 100MiB.

It does the same thing and its much simpler.

When forwarding gossip, rather than relying on Vec doubling, pre-allocate the message encoding buffer.

...as LLVM will handle it just fine for us, in most cases.

codecov-commenter · 2023-11-04T23:28:09Z

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (281a0ae) 88.81% compared to head (7a951b1) 89.16%.
Report is 12 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2708      +/-   ##
==========================================
+ Coverage   88.81%   89.16%   +0.34%     
==========================================
  Files         113      113              
  Lines       89116    91476    +2360     
  Branches    89116    91476    +2360     
==========================================
+ Hits        79152    81561    +2409     
+ Misses       7722     7709      -13     
+ Partials     2242     2206      -36

Files	Coverage Δ
lightning/src/blinded_path/utils.rs	`96.36% <100.00%> (-0.13%)`	⬇️
lightning/src/ln/channel.rs	`88.68% <ø> (+0.03%)`	⬆️
lightning/src/ln/script.rs	`93.57% <ø> (-0.14%)`	⬇️
lightning/src/routing/gossip.rs	`86.45% <100.00%> (+0.12%)`	⬆️
lightning/src/sign/type_resolver.rs	`75.00% <ø> (ø)`
lightning/src/util/indexed_map.rs	`92.59% <100.00%> (+0.43%)`	⬆️
lightning/src/util/ser.rs	`76.74% <100.00%> (+0.23%)`	⬆️
lightning-net-tokio/src/lib.rs	`76.40% <94.11%> (+2.46%)`	⬆️
lightning/src/util/chacha20poly1305rfc.rs	`89.57% <75.00%> (-0.29%)`	⬇️
lightning/src/ln/peer_channel_encryptor.rs	`93.71% <95.23%> (+0.03%)`	⬆️
... and 1 more

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/util/ser.rs

lightning/src/util/chacha20poly1305rfc.rs

lightning/src/ln/peer_channel_encryptor.rs

lightning-net-tokio/src/lib.rs

pending_changelog/113-channel-ser-compat.txt

lightning/src/ln/peer_channel_encryptor.rs

lightning/src/routing/gossip.rs

G8XSU · 2023-11-06T21:31:24Z

lightning/src/util/chacha20poly1305rfc.rs

-		pub(super) fn decrypt_in_place(&mut self, input_output: &mut [u8]) {
+		pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> {
+			self.just_decrypt_in_place(input_output);
+			if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) }


Doubt: should tag be checked before decrypting cipher_text?
RFC: https://www.rfc-editor.org/rfc/rfc7539#appendix-A.5

Doesn't mater, as long as we take the same amount of time in both the valid and invalid cases, and aren't actually doing anything with the decoded bytes until we check the mac. Theoretically its faster, I guess, if we check the mac first, but, like, its not a common case lol.

lightning/src/util/ser.rs

G8XSU · 2023-11-06T22:00:15Z

lightning/src/ln/peer_handler.rs

@@ -1168,6 +1168,14 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, OM: Deref, L: Deref, CM
 			if peer.pending_outbound_buffer_first_msg_offset == next_buff.len() {
 				peer.pending_outbound_buffer_first_msg_offset = 0;
 				peer.pending_outbound_buffer.pop_front();
+				// Try to keep the buffer to no more than 170 elements
+				const VEC_SIZE: usize = ::core::mem::size_of::<Vec<u8>>();
+				let large_capacity = peer.pending_outbound_buffer.capacity() > 4096 / VEC_SIZE;


what is the logic behind this?
"why 170" might be more helpful than "it is 170" in comment above.

Eh, I just dropped it. It wasn't saying anything the code wasn't already.

lightning/src/ln/peer_channel_encryptor.rs

G8XSU · 2023-11-06T22:52:07Z

lightning/src/routing/gossip.rs

@@ -412,11 +412,17 @@ macro_rules! get_pubkey_from_node_id {
 	}
 }

+fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
+	let mut engine = Sha256Hash::engine();
+	msg.write(&mut engine).expect("In-memory structs should not fail to serialize");


shouldn't we be specific?
"Gossip msg should not fail to serialize"

Panic messages have file/line in them, that's more specific than any message we ever write :)

G8XSU · 2023-11-06T23:03:21Z

lightning/src/ln/channel.rs

@@ -6972,14 +6972,6 @@ impl<SP: Deref> Writeable for Channel<SP> where SP::Target: SignerProvider {

 		self.context.latest_monitor_update_id.write(writer)?;

-		let mut key_data = VecWriter(Vec::new());
-		// TODO (taproot|arik): Introduce serialization distinction for non-ECDSA signers.
-		self.context.holder_signer.as_ecdsa().expect("Only ECDSA signers may be serialized").write(&mut key_data)?;


Question: why did we used to write them?
So, nowadays we write channel_keys_id instead?

We used to write them because we didn't really have a fully-formed concept for how key derivation was supposed to work. Now we do and writing the signers is just redundant.

CONTRIBUTING.md

G8XSU · 2023-11-06T23:46:42Z

lightning/src/util/chacha20poly1305rfc.rs

+		pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> {
+			self.just_decrypt_in_place(input_output);
+			if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) }
+		}


nit: encrypt_full_message_in_place can be changed to encrypt_in_place to match/align with this.

Went with check_decrypt_in_place since I think its clearer and a bit more symmetric. Maybe we should rename the encryption side to mac_encrypt_in_place but we can do that another time.

We end up generating a substantial amount of allocations just doubling `Vec`s when serializing to them, and our `serialized_length` method is generally rather effecient, so we just rely on it and allocate correctly up front.

In the next commit we'll use this to avoid an allocation when deserializing messages from the wire.

When decrypting P2P messages, we already have a read buffer that we read the message into. There's no reason to allocate a new `Vec` to store the decrypted message when we can just overwrite the read buffer and call it a day.

When buffering outbound messages for peers, `LinkedList` adds rather substantial allocation overhead, which we avoid here by swapping for a `VecDeque`.

TheBlueMatt · 2023-11-07T18:14:37Z

Squashed with jeff's suggestion:

$ git diff-tree -U1 74887df8 a69dcc3a
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 21792175a..fe7903d88 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -19,2 +19,3 @@ use bitcoin::secp256k1;
 use bitcoin::hashes::sha256::Hash as Sha256Hash;
+use bitcoin::hashes::sha256d::Hash as Sha256dHash;
 use bitcoin::hashes::Hash;
@@ -417,3 +418,3 @@ fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
 	msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
-	Sha256Hash::hash(&Sha256Hash::from_engine(engine)[..]).into_inner()
+	Sha256dHash::from_engine(engine).into_inner()
 }

lightning/src/routing/gossip.rs

lightning/src/ln/peer_channel_encryptor.rs

G8XSU

Lgmt! (apart from CI fix)

When we forward gossip messages, we store them in a separate buffer before we encrypt them (and commit to the order in which they'll appear on the wire). Rather than storing that buffer encoded with no headroom, requiring re-allocating to add the message length and two MAC blocks, we here add the headroom prior to pushing it into the gossip buffer, avoiding an allocation.

Whenever we go to send bytes to a peer, we need to construct a waker for tokio to call back into if we need to finish sending later. That waker needs some reference to the peer's read task to wake it up, hidden behind a single `*const ()`. To do this, we'd previously simply stored a `Box<tokio::mpsc::Sender>` in that pointer, which requires a `clone` for each waker construction. This leads to substantial malloc traffic. Instead, here, we replace this box with an `Arc`, leaving a single `tokio::mpsc::Sender` floating around and simply change the refcounts whenever we construct a new waker, which we can do without allocations.

When we check gossip message signatures, there's no reason to serialize out the full gossip message before hashing, and it generates a lot of allocations during the initial startup when we fetch the full gossip from peers.

This breaks backwards compatibility with versions of LDK prior to 0.0.113 as they expect to always read signer data. This also substantially reduces allocations during `ChannelManager` serialization, as we currently don't pre-allocate the `Vec` that the signer gets written in to. We could alternatively pre-allocate that `Vec`, but we've been set up to skip the write entirely for a while, and 0.0.113 was released nearly a year ago. Users downgrading to LDK 0.0.112 and before at this point should not be expected.

TheBlueMatt · 2023-11-09T22:28:38Z

Should pass this time, sorry about that:

$ git diff-tree -U1 a69dcc3ab 7a951b1bf
diff --git a/lightning/src/ln/peer_channel_encryptor.rs b/lightning/src/ln/peer_channel_encryptor.rs
index 298ff39b9..8569fa60f 100644
--- a/lightning/src/ln/peer_channel_encryptor.rs
+++ b/lightning/src/ln/peer_channel_encryptor.rs
@@ -436,4 +436,3 @@ impl PeerChannelEncryptor {
 	/// For effeciency, the [`Vec::capacity`] should be at least 16 bytes larger than the
-	/// [`Vec::length`], to avoid reallocating for the message MAC, which will be appended to the
-	/// vec.
+	/// [`Vec::len`], to avoid reallocating for the message MAC, which will be appended to the vec.
 	fn encrypt_message_with_header_0s(&mut self, msgbuf: &mut Vec<u8>) {
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index fe7903d88..ff8b084b7 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -18,3 +18,2 @@ use bitcoin::secp256k1;

-use bitcoin::hashes::sha256::Hash as Sha256Hash;
 use bitcoin::hashes::sha256d::Hash as Sha256dHash;
@@ -415,6 +414,6 @@ macro_rules! get_pubkey_from_node_id {

-fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
-	let mut engine = Sha256Hash::engine();
+fn message_sha256d_hash<M: Writeable>(msg: &M) -> Sha256dHash {
+	let mut engine = Sha256dHash::engine();
 	msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
-	Sha256dHash::from_engine(engine).into_inner()
+	Sha256dHash::from_engine(engine)
 }

G8XSU

LGTM!
Feel moderately confident about this change.
(mainly moderate because of 18dc7f2)

tnull

LGTM, now tracking the serialization cleanup over at #2724

G8XSU · 2023-11-10T18:16:01Z

On a separate note: I do wonder if MAX_ALLOC_SIZE spread across multiple places in code while reading different structs needs re-visiting.

TheBlueMatt added 4 commits November 4, 2023 04:00

Prefer Writeable.encode() over VecWriter use

abee51b

It does the same thing and its much simpler.

Pre-allocate send buffer when forwarding gossip

1455452

When forwarding gossip, rather than relying on Vec doubling, pre-allocate the message encoding buffer.

Avoid unnecessarily overriding serialized_length

e09afaf

...as LLVM will handle it just fine for us, in most cases.

TheBlueMatt added this to the 0.0.119 milestone Nov 4, 2023

TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 3d8bfd0 to 761aaad Compare November 5, 2023 01:17

tnull mentioned this pull request Nov 6, 2023

Add storable_builder helper for client side encryption lightningdevkit/vss-rust-client#14

Merged

tnull reviewed Nov 6, 2023

View reviewed changes

jkczyz reviewed Nov 6, 2023

View reviewed changes

lightning/src/ln/peer_channel_encryptor.rs Show resolved Hide resolved

lightning/src/ln/peer_channel_encryptor.rs Outdated Show resolved Hide resolved

lightning/src/routing/gossip.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 761aaad to f9ef511 Compare November 6, 2023 16:58

G8XSU reviewed Nov 6, 2023

View reviewed changes

Pre-allocate the full Vec prior to serializing as a Vec<u8>

e4c6b70

We end up generating a substantial amount of allocations just doubling `Vec`s when serializing to them, and our `serialized_length` method is generally rather effecient, so we just rely on it and allocate correctly up front.

TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 3f6969e to 74887df Compare November 7, 2023 04:23

TheBlueMatt added 3 commits November 7, 2023 18:13

Add an option to in-place decrypt with ChaCha20Poly1305

5e34bc4

In the next commit we'll use this to avoid an allocation when deserializing messages from the wire.

Use VecDeque, rather than LinkedList in peer message buffering

0503df8

When buffering outbound messages for peers, `LinkedList` adds rather substantial allocation overhead, which we avoid here by swapping for a `VecDeque`.

TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 74887df to a69dcc3 Compare November 7, 2023 18:13

jkczyz reviewed Nov 7, 2023

View reviewed changes

lightning/src/routing/gossip.rs Outdated Show resolved Hide resolved

tnull reviewed Nov 7, 2023

View reviewed changes

lightning/src/ln/peer_channel_encryptor.rs Outdated Show resolved Hide resolved

G8XSU reviewed Nov 8, 2023

View reviewed changes

TheBlueMatt added 4 commits November 9, 2023 22:28

Avoid allocating when checking gossip message signatures

a8d4cfa

When we check gossip message signatures, there's no reason to serialize out the full gossip message before hashing, and it generates a lot of allocations during the initial startup when we fetch the full gossip from peers.

TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from a69dcc3 to 7a951b1 Compare November 9, 2023 22:28

G8XSU approved these changes Nov 10, 2023

View reviewed changes

tnull mentioned this pull request Nov 10, 2023

Simplify serialization where backwards compat. to LDK prior 0.0.113 kept us from doing so #2724

Open

tnull approved these changes Nov 10, 2023

View reviewed changes

TheBlueMatt merged commit 103180d into lightningdevkit:main Nov 13, 2023

Reduce common allocations across the codebase #2708

Reduce common allocations across the codebase #2708

Uh oh!

Conversation

TheBlueMatt commented Nov 4, 2023

Uh oh!

codecov-commenter commented Nov 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

G8XSU left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

G8XSU left a comment

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

G8XSU commented Nov 10, 2023

Uh oh!

Uh oh!

codecov-commenter commented Nov 4, 2023 •

edited

Loading

TheBlueMatt commented Nov 9, 2023 •

edited

Loading