-
Notifications
You must be signed in to change notification settings - Fork 407
Reduce common allocations across the codebase #2708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce common allocations across the codebase #2708
Conversation
When we're reading a `NetworkGraph`, we know how many nodes/channels we are reading, there's no reason not to pre-allocate the `IndexedMap`'s inner `HashMap` and `Vec`, which we do here. This seems to reduce on-startup heap fragmentation with glibc by something like 100MiB.
It does the same thing and its much simpler.
When forwarding gossip, rather than relying on Vec doubling, pre-allocate the message encoding buffer.
...as LLVM will handle it just fine for us, in most cases.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2708 +/- ##
==========================================
+ Coverage 88.81% 89.16% +0.34%
==========================================
Files 113 113
Lines 89116 91476 +2360
Branches 89116 91476 +2360
==========================================
+ Hits 79152 81561 +2409
+ Misses 7722 7709 -13
+ Partials 2242 2206 -36
☔ View full report in Codecov by Sentry. |
3d8bfd0
to
761aaad
Compare
761aaad
to
f9ef511
Compare
pub(super) fn decrypt_in_place(&mut self, input_output: &mut [u8]) { | ||
pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> { | ||
self.just_decrypt_in_place(input_output); | ||
if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doubt: should tag be checked before decrypting cipher_text?
RFC: https://www.rfc-editor.org/rfc/rfc7539#appendix-A.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't mater, as long as we take the same amount of time in both the valid and invalid cases, and aren't actually doing anything with the decoded bytes until we check the mac. Theoretically its faster, I guess, if we check the mac first, but, like, its not a common case lol.
@@ -1168,6 +1168,14 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, OM: Deref, L: Deref, CM | |||
if peer.pending_outbound_buffer_first_msg_offset == next_buff.len() { | |||
peer.pending_outbound_buffer_first_msg_offset = 0; | |||
peer.pending_outbound_buffer.pop_front(); | |||
// Try to keep the buffer to no more than 170 elements | |||
const VEC_SIZE: usize = ::core::mem::size_of::<Vec<u8>>(); | |||
let large_capacity = peer.pending_outbound_buffer.capacity() > 4096 / VEC_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the logic behind this?
"why 170" might be more helpful than "it is 170" in comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I just dropped it. It wasn't saying anything the code wasn't already.
@@ -412,11 +412,17 @@ macro_rules! get_pubkey_from_node_id { | |||
} | |||
} | |||
|
|||
fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] { | |||
let mut engine = Sha256Hash::engine(); | |||
msg.write(&mut engine).expect("In-memory structs should not fail to serialize"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we be specific?
"Gossip msg should not fail to serialize"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic messages have file/line in them, that's more specific than any message we ever write :)
@@ -6972,14 +6972,6 @@ impl<SP: Deref> Writeable for Channel<SP> where SP::Target: SignerProvider { | |||
|
|||
self.context.latest_monitor_update_id.write(writer)?; | |||
|
|||
let mut key_data = VecWriter(Vec::new()); | |||
// TODO (taproot|arik): Introduce serialization distinction for non-ECDSA signers. | |||
self.context.holder_signer.as_ecdsa().expect("Only ECDSA signers may be serialized").write(&mut key_data)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why did we used to write them?
So, nowadays we write channel_keys_id instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to write them because we didn't really have a fully-formed concept for how key derivation was supposed to work. Now we do and writing the signers is just redundant.
pub fn decrypt_in_place(&mut self, input_output: &mut [u8], tag: &[u8]) -> Result<(), ()> { | ||
self.just_decrypt_in_place(input_output); | ||
if self.finish_and_check_tag(tag) { Ok(()) } else { Err(()) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: encrypt_full_message_in_place can be changed to encrypt_in_place to match/align with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with check_decrypt_in_place
since I think its clearer and a bit more symmetric. Maybe we should rename the encryption side to mac_encrypt_in_place
but we can do that another time.
We end up generating a substantial amount of allocations just doubling `Vec`s when serializing to them, and our `serialized_length` method is generally rather effecient, so we just rely on it and allocate correctly up front.
3f6969e
to
74887df
Compare
In the next commit we'll use this to avoid an allocation when deserializing messages from the wire.
When decrypting P2P messages, we already have a read buffer that we read the message into. There's no reason to allocate a new `Vec` to store the decrypted message when we can just overwrite the read buffer and call it a day.
When buffering outbound messages for peers, `LinkedList` adds rather substantial allocation overhead, which we avoid here by swapping for a `VecDeque`.
74887df
to
a69dcc3
Compare
Squashed with jeff's suggestion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgmt! (apart from CI fix)
When we forward gossip messages, we store them in a separate buffer before we encrypt them (and commit to the order in which they'll appear on the wire). Rather than storing that buffer encoded with no headroom, requiring re-allocating to add the message length and two MAC blocks, we here add the headroom prior to pushing it into the gossip buffer, avoiding an allocation.
Whenever we go to send bytes to a peer, we need to construct a waker for tokio to call back into if we need to finish sending later. That waker needs some reference to the peer's read task to wake it up, hidden behind a single `*const ()`. To do this, we'd previously simply stored a `Box<tokio::mpsc::Sender>` in that pointer, which requires a `clone` for each waker construction. This leads to substantial malloc traffic. Instead, here, we replace this box with an `Arc`, leaving a single `tokio::mpsc::Sender` floating around and simply change the refcounts whenever we construct a new waker, which we can do without allocations.
When we check gossip message signatures, there's no reason to serialize out the full gossip message before hashing, and it generates a lot of allocations during the initial startup when we fetch the full gossip from peers.
This breaks backwards compatibility with versions of LDK prior to 0.0.113 as they expect to always read signer data. This also substantially reduces allocations during `ChannelManager` serialization, as we currently don't pre-allocate the `Vec` that the signer gets written in to. We could alternatively pre-allocate that `Vec`, but we've been set up to skip the write entirely for a while, and 0.0.113 was released nearly a year ago. Users downgrading to LDK 0.0.112 and before at this point should not be expected.
a69dcc3
to
7a951b1
Compare
Should pass this time, sorry about that: $ git diff-tree -U1 a69dcc3ab 7a951b1bf
diff --git a/lightning/src/ln/peer_channel_encryptor.rs b/lightning/src/ln/peer_channel_encryptor.rs
index 298ff39b9..8569fa60f 100644
--- a/lightning/src/ln/peer_channel_encryptor.rs
+++ b/lightning/src/ln/peer_channel_encryptor.rs
@@ -436,4 +436,3 @@ impl PeerChannelEncryptor {
/// For effeciency, the [`Vec::capacity`] should be at least 16 bytes larger than the
- /// [`Vec::length`], to avoid reallocating for the message MAC, which will be appended to the
- /// vec.
+ /// [`Vec::len`], to avoid reallocating for the message MAC, which will be appended to the vec.
fn encrypt_message_with_header_0s(&mut self, msgbuf: &mut Vec<u8>) {
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index fe7903d88..ff8b084b7 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -18,3 +18,2 @@ use bitcoin::secp256k1;
-use bitcoin::hashes::sha256::Hash as Sha256Hash;
use bitcoin::hashes::sha256d::Hash as Sha256dHash;
@@ -415,6 +414,6 @@ macro_rules! get_pubkey_from_node_id {
-fn message_sha256d_hash<M: Writeable>(msg: &M) -> [u8; 32] {
- let mut engine = Sha256Hash::engine();
+fn message_sha256d_hash<M: Writeable>(msg: &M) -> Sha256dHash {
+ let mut engine = Sha256dHash::engine();
msg.write(&mut engine).expect("In-memory structs should not fail to serialize");
- Sha256dHash::from_engine(engine).into_inner()
+ Sha256dHash::from_engine(engine)
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Feel moderately confident about this change.
(mainly moderate because of 18dc7f2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, now tracking the serialization cleanup over at #2724
On a separate note: I do wonder if MAX_ALLOC_SIZE spread across multiple places in code while reading different structs needs re-visiting. |
My node has been experiencing more and more memory fragmentation lately, and while it seems the majority of that is #2706 and #2707, there's still plenty of room for misc improvements all over the place. With this and fixes for the other two issues we should be in a pretty good place, with allocations dominated by farrrr by block deserialization when syncing.
There's two commits here that could be performance regressions:
Pre-allocate the full require Vec prior to serializing into vecs
which runs through our serialization logic twice in many cases before writing. I played around with a lower_boundWriteable
method to optimize out some cases of having to run through the logic, but it doesn't really help inChannelManager
andChannelMonitor
or other deeply-nested structs because we're callingwrite
there which hits ourLengthCalculatingWriter
instead of being able to use an optimized version. We could totally restructure the API to haveWriteable
s call a magic method on theWriter
which can short-circuit the write, but that's a lot of indirection and I'm lazy.Avoid allocating when checking gossip message signatures
probably isn't a huge regression, cause hashers are buffered, in essence, anyway, but I didn't check.