Skip to content

Reduce common allocations across the codebase #2708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

My node has been experiencing more and more memory fragmentation lately, and while it seems the majority of that is #2706 and #2707, there's still plenty of room for misc improvements all over the place. With this and fixes for the other two issues we should be in a pretty good place, with allocations dominated by farrrr by block deserialization when syncing.

There's two commits here that could be performance regressions:

  • Pre-allocate the full require Vec prior to serializing into vecs which runs through our serialization logic twice in many cases before writing. I played around with a lower_bound Writeable method to optimize out some cases of having to run through the logic, but it doesn't really help in ChannelManager and ChannelMonitor or other deeply-nested structs because we're calling write there which hits our LengthCalculatingWriter instead of being able to use an optimized version. We could totally restructure the API to have Writeables call a magic method on the Writer which can short-circuit the write, but that's a lot of indirection and I'm lazy.
  • Avoid allocating when checking gossip message signatures probably isn't a huge regression, cause hashers are buffered, in essence, anyway, but I didn't check.

When we're reading a `NetworkGraph`, we know how many
nodes/channels we are reading, there's no reason not to
pre-allocate the `IndexedMap`'s inner `HashMap` and `Vec`, which we
do here.

This seems to reduce on-startup heap fragmentation with glibc by
something like 100MiB.
It does the same thing and its much simpler.
When forwarding gossip, rather than relying on Vec doubling,
pre-allocate the message encoding buffer.
...as LLVM will handle it just fine for us, in most cases.
@TheBlueMatt TheBlueMatt added this to the 0.0.119 milestone Nov 4, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (281a0ae) 88.81% compared to head (7a951b1) 89.16%.
Report is 12 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2708      +/-   ##
==========================================
+ Coverage   88.81%   89.16%   +0.34%     
==========================================
  Files         113      113              
  Lines       89116    91476    +2360     
  Branches    89116    91476    +2360     
==========================================
+ Hits        79152    81561    +2409     
+ Misses       7722     7709      -13     
+ Partials     2242     2206      -36     
Files Coverage Δ
lightning/src/blinded_path/utils.rs 96.36% <100.00%> (-0.13%) ⬇️
lightning/src/ln/channel.rs 88.68% <ø> (+0.03%) ⬆️
lightning/src/ln/script.rs 93.57% <ø> (-0.14%) ⬇️
lightning/src/routing/gossip.rs 86.45% <100.00%> (+0.12%) ⬆️
lightning/src/sign/type_resolver.rs 75.00% <ø> (ø)
lightning/src/util/indexed_map.rs 92.59% <100.00%> (+0.43%) ⬆️
lightning/src/util/ser.rs 76.74% <100.00%> (+0.23%) ⬆️
lightning-net-tokio/src/lib.rs 76.40% <94.11%> (+2.46%) ⬆️
lightning/src/util/chacha20poly1305rfc.rs 89.57% <75.00%> (-0.29%) ⬇️
lightning/src/ln/peer_channel_encryptor.rs 93.71% <95.23%> (+0.03%) ⬆️
... and 1 more

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 761aaad to f9ef511 Compare November 6, 2023 16:58
pub(super) fn decrypt_in_place(&mut self, input_output: &mut [u8]) {
pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> {
self.just_decrypt_in_place(input_output);
if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doubt: should tag be checked before decrypting cipher_text?
RFC: https://www.rfc-editor.org/rfc/rfc7539#appendix-A.5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't mater, as long as we take the same amount of time in both the valid and invalid cases, and aren't actually doing anything with the decoded bytes until we check the mac. Theoretically its faster, I guess, if we check the mac first, but, like, its not a common case lol.

@@ -1168,6 +1168,14 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, OM: Deref, L: Deref, CM
if peer.pending_outbound_buffer_first_msg_offset == next_buff.len() {
peer.pending_outbound_buffer_first_msg_offset = 0;
peer.pending_outbound_buffer.pop_front();
// Try to keep the buffer to no more than 170 elements
const VEC_SIZE: usize = ::core::mem::size_of::<Vec<u8>>();
let large_capacity = peer.pending_outbound_buffer.capacity() > 4096 / VEC_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the logic behind this?
"why 170" might be more helpful than "it is 170" in comment above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I just dropped it. It wasn't saying anything the code wasn't already.

@@ -412,11 +412,17 @@ macro_rules! get_pubkey_from_node_id {
}
}

fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
let mut engine = Sha256Hash::engine();
msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be specific?
"Gossip msg should not fail to serialize"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic messages have file/line in them, that's more specific than any message we ever write :)

@@ -6972,14 +6972,6 @@ impl<SP: Deref> Writeable for Channel<SP> where SP::Target: SignerProvider {

self.context.latest_monitor_update_id.write(writer)?;

let mut key_data = VecWriter(Vec::new());
// TODO (taproot|arik): Introduce serialization distinction for non-ECDSA signers.
self.context.holder_signer.as_ecdsa().expect("Only ECDSA signers may be serialized").write(&mut key_data)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why did we used to write them?
So, nowadays we write channel_keys_id instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to write them because we didn't really have a fully-formed concept for how key derivation was supposed to work. Now we do and writing the signers is just redundant.

pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> {
self.just_decrypt_in_place(input_output);
if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: encrypt_full_message_in_place can be changed to encrypt_in_place to match/align with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with check_decrypt_in_place since I think its clearer and a bit more symmetric. Maybe we should rename the encryption side to mac_encrypt_in_place but we can do that another time.

We end up generating a substantial amount of allocations just
doubling `Vec`s when serializing to them, and our
`serialized_length` method is generally rather effecient, so we
just rely on it and allocate correctly up front.
@TheBlueMatt TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 3f6969e to 74887df Compare November 7, 2023 04:23
In the next commit we'll use this to avoid an allocation when
deserializing messages from the wire.
When decrypting P2P messages, we already have a read buffer that we
read the message into. There's no reason to allocate a new `Vec` to
store the decrypted message when we can just overwrite the read
buffer and call it a day.
When buffering outbound messages for peers, `LinkedList` adds
rather substantial allocation overhead, which we avoid here by
swapping for a `VecDeque`.
@TheBlueMatt TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from 74887df to a69dcc3 Compare November 7, 2023 18:13
@TheBlueMatt
Copy link
Collaborator Author

Squashed with jeff's suggestion:

$ git diff-tree -U1 74887df8 a69dcc3a
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 21792175a..fe7903d88 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -19,2 +19,3 @@ use bitcoin::secp256k1;
 use bitcoin::hashes::sha256::Hash as Sha256Hash;
+use bitcoin::hashes::sha256d::Hash as Sha256dHash;
 use bitcoin::hashes::Hash;
@@ -417,3 +418,3 @@ fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
 	msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
-	Sha256Hash::hash(&Sha256Hash::from_engine(engine)[..]).into_inner()
+	Sha256dHash::from_engine(engine).into_inner()
 }

Copy link
Contributor

@G8XSU G8XSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgmt! (apart from CI fix)

When we forward gossip messages, we store them in a separate buffer
before we encrypt them (and commit to the order in which they'll
appear on the wire). Rather than storing that buffer encoded with
no headroom, requiring re-allocating to add the message length and
two MAC blocks, we here add the headroom prior to pushing it into
the gossip buffer, avoiding an allocation.
Whenever we go to send bytes to a peer, we need to construct a
waker for tokio to call back into if we need to finish sending
later. That waker needs some reference to the peer's read task to
wake it up, hidden behind a single `*const ()`. To do this, we'd
previously simply stored a `Box<tokio::mpsc::Sender>` in that
pointer, which requires a `clone` for each waker construction. This
leads to substantial malloc traffic.

Instead, here, we replace this box with an `Arc`, leaving a single
`tokio::mpsc::Sender` floating around and simply change the
refcounts whenever we construct a new waker, which we can do
without allocations.
When we check gossip message signatures, there's no reason to
serialize out the full gossip message before hashing, and it
generates a lot of allocations during the initial startup when we
fetch the full gossip from peers.
This breaks backwards compatibility with versions of LDK prior to
0.0.113 as they expect to always read signer data.

This also substantially reduces allocations during `ChannelManager`
serialization, as we currently don't pre-allocate the `Vec` that
the signer gets written in to. We could alternatively pre-allocate
that `Vec`, but we've been set up to skip the write entirely for a
while, and 0.0.113 was released nearly a year ago. Users
downgrading to LDK 0.0.112 and before at this point should not be
expected.
@TheBlueMatt TheBlueMatt force-pushed the 2023-11-less-graph-memory-frag branch from a69dcc3 to 7a951b1 Compare November 9, 2023 22:28
@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Nov 9, 2023

Should pass this time, sorry about that:

$ git diff-tree -U1 a69dcc3ab 7a951b1bf
diff --git a/lightning/src/ln/peer_channel_encryptor.rs b/lightning/src/ln/peer_channel_encryptor.rs
index 298ff39b9..8569fa60f 100644
--- a/lightning/src/ln/peer_channel_encryptor.rs
+++ b/lightning/src/ln/peer_channel_encryptor.rs
@@ -436,4 +436,3 @@ impl PeerChannelEncryptor {
 	/// For effeciency, the [`Vec::capacity`] should be at least 16 bytes larger than the
-	/// [`Vec::length`], to avoid reallocating for the message MAC, which will be appended to the
-	/// vec.
+	/// [`Vec::len`], to avoid reallocating for the message MAC, which will be appended to the vec.
 	fn encrypt_message_with_header_0s(&mut self, msgbuf: &mut Vec<u8>) {
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index fe7903d88..ff8b084b7 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -18,3 +18,2 @@ use bitcoin::secp256k1;

-use bitcoin::hashes::sha256::Hash as Sha256Hash;
 use bitcoin::hashes::sha256d::Hash as Sha256dHash;
@@ -415,6 +414,6 @@ macro_rules! get_pubkey_from_node_id {

-fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
-	let mut engine = Sha256Hash::engine();
+fn message_sha256d_hash<M: Writeable>(msg: &M) -> Sha256dHash {
+	let mut engine = Sha256dHash::engine();
 	msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
-	Sha256dHash::from_engine(engine).into_inner()
+	Sha256dHash::from_engine(engine)
 }

Copy link
Contributor

@G8XSU G8XSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Feel moderately confident about this change.
(mainly moderate because of 18dc7f2)

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now tracking the serialization cleanup over at #2724

@G8XSU
Copy link
Contributor

G8XSU commented Nov 10, 2023

On a separate note: I do wonder if MAX_ALLOC_SIZE spread across multiple places in code while reading different structs needs re-visiting.

@TheBlueMatt TheBlueMatt merged commit 103180d into lightningdevkit:main Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants