Initial file adder #220

koivunej · 2020-07-02T13:18:17Z

Creates multiblock balanced trees. Is quite slow and uses quite a lot of heap.

unixfs/src/file/adder.rs

koivunej · 2020-07-02T14:19:59Z

unixfs/src/file/adder.rs

+        let (accepted, ready) = self.chunker.accept(input, &self.block_buffer);
+        self.block_buffer.extend_from_slice(accepted);
+        let written = accepted.len();
+


Minor 10MB/s increase if a "less slow path" is added here where self.block_buffer.is_empty() && ready, as in the caller uses a buffer.len() >= self.size_hint(). BufReader doesn't do this normally, with 256 * 1024 buffer it's fill_buf will only issue a single read and the result will be probably the platform stdin buffer size or anything below it (I got 4096 multiples 1 and 32).

Doing full size_hint reads with two Vecs with two threads produces similar speed as the single threaded version (passing the buffers and back forth with two channels). Probably the bottleneck is the many allocations while writing..

After a few flamegraphs later: for the raw ingestion rate doesn't really matter on how clever the stdin reading is. It depends on the sha2 (quite surprisingly).

I tried these stdin reading approaches:

std::io::BufReader (which issues a single read)

full size_hint sized vecs (many reads)

another thread with std::sync::mpsc::channel for communication with n size_hint vecs (N >= 2)

I wont complicate this PR with the above but I will push that as somewhere to bitrot and update this later with a link. Link: https://github.com/koivunej/rust-ipfs/commit/a167db89d6280dacb139dd8617aef0c7c72cf7f6

All of the stdin reading approaches (with this "less-slow-path" fix) produced about ~230MBps ingestion rate for a 5GB test input. On my laptop I also have firefox running with gazillion tabs and so on.

Upgrading sha2 from 0.8.1 to 0.9.1 for this experiment yields ~1GBps ingestion rate.

This leads me to believe that this copying heavy but simple code is good enough (tm).

unixfs/src/file/adder.rs

koivunej · 2020-07-02T20:42:19Z

I think this is starting to be ready. Any strong feelings on splitting the file adder up? There will be still quite massive changes when we add a trickle collector/layout.

Any ideas what to do with the zstd test files given by @ribasushi? I am not sure if I am able to do a test cases with them, as there isn't a pure-rust zstd impl... I did add the failures as test cases (174 * 256 * 1024 + 1 and empty file). I've been running these as a one-liner:

for x in $testdata/*.zst; do \
  echo -n "$(basename "$x"): "; \
  zstd -dck --no-progress "$x" 2>/dev/null \
    | cargo run --quiet --release --example add 2>&1 \
    | sed -n -Ee 's/^.*, (Qm\S+) .*$/\1/p'; \
  done \
  | tee testdata_answers.txt
large_repeat_1GiB.zst: QmRVsvVn6jyWLTVSPKjQEQRHKJr1ZzSnb8AFx2BSbmvYFS
large_repeat_5GiB.zst: QmSmResp92bFKXTLyvLn3koCyU6g3JdWznGMGLz6tKiSt1
repeat_0.04GiB_174_1.zst: QmW745Qgv691MFBz6u4sVJikm5aKcLV3DZ8xjC5QuGdhKk
repeat_0.04GiB_174.zst: QmPLuh4niAmwuNHpT6rtu5D6gbANH8MMEW8TLri8sSrVcz
repeat_0.04GiB_175_1.zst: QmbQf2uRgFMCaoRc38ynesVANpfAMy1AmbsPLTAE6Mfw4N
repeat_0.04GiB_175.zst: QmZHGwbjMAsXBHzeVrswbpvaLybdQT63Dd6eqz1wKymJrM
uicro_1B.zst: QmRy2a1t6YSCiqXbkUQ1xQtXUQzX4mXmdzjLGey6884Uhb
uicro_34B.zst: QmXibj3M54EnhU3pVoJ1XBFQSHuo7xBGWs4EaW8HNRfrGP
uicro_50B.zst: QmZvZ37GWqBZuJMfdgB8VGWMkh4k4F3mSagohAMtZ7EG8B
zero_0B.zst: QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH

Then just diff -u testdata_{baseline,answers}.txt. Have to admit I was lazy and ran the inputs with go-ipfs I had available easier :)

unixfs/src/file/adder.rs

Should be fixable to create trees for files.

they are needed in the adder to create links.

no noticeable changes in benchmarks

Co-authored-by: ljedrz <[email protected]>

koivunej · 2020-07-03T09:27:42Z

Rebased about half of the commits away (into the first). Seeing dimishing returns on this, while there is still repeated back and forth. I wonder if I should write a tool which would tell me all conflict free orderings of my commits.. :)

re: zstd tests, not introducing them. I'll introduce test cases manually when new bugs are found.
re: splitting, not splitting yet as trickle will be a lot of changes.

ljedrz · 2020-07-03T09:32:20Z

unixfs/CHANGELOG.md

@@ -1,8 +1,10 @@
 # Next

+* Initial facilities for building File trees [#220]


Suggested change

* Initial facilities for building File trees [#220]

* Initial utilities for building File trees [#220]

ljedrz

LGTM 👍

ljedrz · 2020-07-03T09:33:12Z

bors r+

bors · 2020-07-03T09:46:50Z

Build succeeded:

ci

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

koivunej commented Jul 2, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

ljedrz reviewed Jul 3, 2020

View reviewed changes

unixfs/src/file/adder.rs Outdated Show resolved Hide resolved

Joonas Koivunen added 6 commits July 3, 2020 12:20

feat: initial file adder

36ad14b

Should be fixable to create trees for files.

chore: make sha2 and multihash !dev deps

6498436

they are needed in the adder to create links.

add: examples/add.rs, still bad output

587fd14

test: add test case for three layers

9ece3b9

test: more tests to catch the chunking issue

c54118f

refactor: remove indentation level

b5868ee

Joonas Koivunen and others added 18 commits July 3, 2020 12:20

fix: the 174*256*1024 + 1 case

1507366

perf: reuse links and blocksizes

791b033

no noticeable changes in benchmarks

doc: adjust comments

bdeefb0

refactor: simplify link flushing

7bddf33

refactor: collector out of the adder

c13bd36

fix: do less work to speed up test

1c4f59a

refactor: allow one less copy with prechunked input

eb3a0bb

add: FileAdderBuilder and enhanced docs

d3bd086

refactor: remove unused result return type

e24eb1c

refactor: complicate/simplify link debugging

0ff594c

chore: clippy warning

8912649

fix: debug intermediate levels as well

c47bfc1

doc: suggestions from code review

a409095

Co-authored-by: ljedrz <[email protected]>

refactor: name the 4-tuple as Link

2806597

refactor: use Default::default() struct init

13e5036

doc: enhance, document lore, suggestions

7765780

Co-authored-by: ljedrz <[email protected]>

chore: changelog

2d31efa

refactor: go back to moving builder

2553de3

koivunej force-pushed the add_prototype branch from 9a95ef9 to 2553de3 Compare July 3, 2020 09:21

koivunej marked this pull request as ready for review July 3, 2020 09:27

koivunej requested a review from ljedrz July 3, 2020 09:28

koivunej linked an issue Jul 3, 2020 that may be closed by this pull request

ipfs-unixfs: importing files with balanced layout #215

Closed

ljedrz reviewed Jul 3, 2020

View reviewed changes

ljedrz approved these changes Jul 3, 2020

View reviewed changes

koivunej mentioned this pull request Jul 3, 2020

ipfs-unixfs: other layouts, other chunkers, more options #222

Open

4 tasks

bors bot merged commit 933b52d into rs-ipfs:master Jul 3, 2020

koivunej deleted the add_prototype branch September 24, 2020 12:55

		@@ -1,8 +1,10 @@
		# Next

		* Initial facilities for building File trees [#220]

	* Initial facilities for building File trees [#220]
	* Initial utilities for building File trees [#220]

Initial file adder #220

Initial file adder #220

Uh oh!

Conversation

koivunej commented Jul 2, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koivunej Jul 2, 2020

Choose a reason for hiding this comment

Uh oh!

koivunej Jul 2, 2020

Choose a reason for hiding this comment

Uh oh!

koivunej Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

koivunej commented Jul 2, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koivunej commented Jul 3, 2020

Uh oh!

ljedrz Jul 3, 2020

Choose a reason for hiding this comment

Uh oh!

ljedrz left a comment

Choose a reason for hiding this comment

Uh oh!

ljedrz commented Jul 3, 2020

Uh oh!

bors bot commented Jul 3, 2020

Uh oh!

Uh oh!

koivunej Jul 2, 2020 •

edited

Loading