-
Notifications
You must be signed in to change notification settings - Fork 168
Conversation
let (accepted, ready) = self.chunker.accept(input, &self.block_buffer); | ||
self.block_buffer.extend_from_slice(accepted); | ||
let written = accepted.len(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor 10MB/s increase if a "less slow path" is added here where self.block_buffer.is_empty() && ready
, as in the caller uses a buffer.len() >= self.size_hint()
. BufReader doesn't do this normally, with 256 * 1024
buffer it's fill_buf
will only issue a single read and the result will be probably the platform stdin buffer size or anything below it (I got 4096 multiples 1 and 32).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing full size_hint
reads with two Vec
s with two threads produces similar speed as the single threaded version (passing the buffers and back forth with two channels). Probably the bottleneck is the many allocations while writing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a few flamegraphs later: for the raw ingestion rate doesn't really matter on how clever the stdin reading is. It depends on the sha2 (quite surprisingly).
I tried these stdin reading approaches:
- std::io::BufReader (which issues a single
read
) - full size_hint sized vecs (many
read
s) - another thread with std::sync::mpsc::channel for communication with n size_hint vecs (
N >= 2
)
I wont complicate this PR with the above but I will push that as somewhere to bitrot and update this later with a link. Link: https://github.com/koivunej/rust-ipfs/commit/a167db89d6280dacb139dd8617aef0c7c72cf7f6
All of the stdin reading approaches (with this "less-slow-path" fix) produced about ~230MBps ingestion rate for a 5GB test input. On my laptop I also have firefox running with gazillion tabs and so on.
Upgrading sha2 from 0.8.1 to 0.9.1 for this experiment yields ~1GBps ingestion rate.
This leads me to believe that this copying heavy but simple code is good enough (tm).
I think this is starting to be ready. Any strong feelings on splitting the file adder up? There will be still quite massive changes when we add a trickle collector/layout. Any ideas what to do with the zstd test files given by @ribasushi? I am not sure if I am able to do a test cases with them, as there isn't a pure-rust zstd impl... I did add the failures as test cases (174 * 256 * 1024 + 1 and empty file). I've been running these as a one-liner:
Then just |
Should be fixable to create trees for files.
they are needed in the adder to create links.
no noticeable changes in benchmarks
Co-authored-by: ljedrz <[email protected]>
Co-authored-by: ljedrz <[email protected]>
Rebased about half of the commits away (into the first). Seeing dimishing returns on this, while there is still repeated back and forth. I wonder if I should write a tool which would tell me all conflict free orderings of my commits.. :) re: zstd tests, not introducing them. I'll introduce test cases manually when new bugs are found. |
@@ -1,8 +1,10 @@ | |||
# Next | |||
|
|||
* Initial facilities for building File trees [#220] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Initial facilities for building File trees [#220] | |
* Initial utilities for building File trees [#220] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
bors r+ |
Build succeeded: |
Creates multiblock balanced trees. Is quite slow and uses quite a lot of heap.