Skip to content

Reduce memory usage when uploading files by a factor of 11 #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 28, 2020

Conversation

jyn514
Copy link
Member

@jyn514 jyn514 commented Mar 28, 2020

Addresses #658. A future PR could reduce usage by a further factor of 2 by removing unnecessary clone()s in the upload code. Done.

This uses the Rust mime_guess package instead libmagic.
Most work was already done in #600.
This just sets mime_guess to build unconditionally instead of only on Windows.

This improves the memory usage of docs.rs by over a factor of 5 when
uploading files in some cases. This can be tested locally like so:

Run minio in the background:

docker run -p 9000:9000 -e MINIO_ACCESS_KEY=password -e MINIO_SECRET_KEY=password minio/minio server /data

Navigate to localhost:9000/ and add a new bucket called rust-docs-rs.

Now in another terminal, run docs.rs. This assumes you already have the
CRATESFYI_DATABASE_URL set up locally as described in
developing without docker-compose:

 # set up a fairly large file
mkdir -p ignored/tmp
yes | head -n $((1024 * 1024 * 50)) > ignored/tmp/100MB

git checkout master
cargo build --release
RUST_LOG=cratesfyi,info S3_ENDPOINT=http://localhost:9000 AWS_ACCESS_KEY_ID=password AWS_SECRET_ACCESS_KEY=password valgrind --tool=massif --massif-out-file=libmagic.massif.out target/release/cratesfyi database add-directory ignored/tmp/
 # this will show ~1.5 GB used, depending on your system
ms_print libmagic.massif.out | less

 # now try without libmagic
git checkout druid
cargo build --release
RUST_LOG=cratesfyi,info S3_ENDPOINT=http://localhost:9000 AWS_ACCESS_KEY_ID=password AWS_SECRET_ACCESS_KEY=password valgrind --tool=massif --massif-out-file=no_libmagic.massif.out target/release/cratesfyi database add-directory ignored/tmp/
 # this will show ~250 MB used, depending on your system
ms_print no_libmagic.massif.out | less

cc @Zexbe

r? @Mark-Simulacrum

This uses the Rust tree_magic package instead.

Most work was already done in
rust-lang#600.
This just sets tree_magic to build unconditionally instead of always on
Windows.

This improves the memory usage of docs.rs by over a factor of 5 when
uploading files in some cases. This can be tested locally like so:

Run minio in the background:
```
docker run -p 9000:9000 -e MINIO_ACCESS_KEY=password -e MINIO_SECRET_KEY=password minio/minio server /data
```

Navigate to localhost:9000/ and set both the access key and secret key
to 'password'. Add a new bucket called `rust-docs-rs`.

Now in another terminal, run docs.rs. This assumes you already have the
CRATESFYI_DATABASE_URL set up locally as described in
[developing without docker-compose](https://forge.rust-lang.org/docs-rs/no-docker-compose.html):

```
 # set up a fairly large file
mkdir -p ignored/tmp
yes | head -n $((1024 * 1024 * 50)) > ignored/tmp/100MB

git checkout master
cargo build --release
RUST_LOG=cratesfyi,info S3_ENDPOINT=http://localhost:9000 AWS_ACCESS_KEY_ID=password AWS_SECRET_ACCESS_KEY=password valgrind --tool=massif --massif-out-file=libmagic.massif.out target/release/cratesfyi database add-directory ignored/tmp/
 # this will show ~1.5 GB used, depending on your system
ms_print libmagic.massif.out | less

 # now try without libmagic
git checkout druid
cargo build --release
RUST_LOG=cratesfyi,info S3_ENDPOINT=http://localhost:9000 AWS_ACCESS_KEY_ID=password AWS_SECRET_ACCESS_KEY=password valgrind --tool=massif --massif-out-file=no_libmagic.massif.out target/release/cratesfyi database add-directory ignored/tmp/
 # this will show ~250 MB used, depending on your system
ms_print no_libmagic.massif.out | less
```
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to check if we can avoid allocating a String for the mime type - it looks like internally it's a &'static str in the library.

@jyn514
Copy link
Member Author

jyn514 commented Mar 28, 2020

Note that this now moves from setting the content-type based on the contents to based on the file extension. I don't foresee this impacting us, but I'll run a build with the new library and make sure that everything looks about right in the browser.

@jyn514
Copy link
Member Author

jyn514 commented Mar 28, 2020

Everything looks fine on a local build.

image

@jyn514
Copy link
Member Author

jyn514 commented Mar 28, 2020

After the last commit, this now only uses 129 MB locally, for a reduction of over 11x.

@jyn514 jyn514 changed the title Remove memory usage when uploading files by a factor of 5 Remove memory usage when uploading files by a factor of 11 Mar 28, 2020
@jyn514 jyn514 changed the title Remove memory usage when uploading files by a factor of 11 Reducie memory usage when uploading files by a factor of 11 Mar 28, 2020
@jyn514 jyn514 changed the title Reducie memory usage when uploading files by a factor of 11 Reduce memory usage when uploading files by a factor of 11 Mar 28, 2020
@Mark-Simulacrum
Copy link
Member

I'm happy to merge if you're ready, I think this is good to go.

@jyn514 jyn514 merged commit a21a1c0 into rust-lang:master Mar 28, 2020
@jyn514 jyn514 deleted the druid branch March 28, 2020 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants