-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Autodiff batching2 #139351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiff batching2 #139351
Conversation
This comment has been minimized.
This comment has been minimized.
4b84744
to
2935695
Compare
This comment has been minimized.
This comment has been minimized.
2935695
to
88389b5
Compare
This comment has been minimized.
This comment has been minimized.
88389b5
to
ce3ab30
Compare
@oli-obk I'm almost done with features, but there are two paths forward here, so I'd appreciate some help with the design. The *v variants (dupv, dualvonly) allow better vectorization, by accepting larger shadow arguments. If you look at rustc_codegen_llvm you'll see that I hardcoded the byte_size_of(type) to 4, since my tests use floats. In my typetree work (the only part I have not upstreamed from my fork) I have a little bit of logic here to handle them, but I haven't used it yet to figure out the byte size, so I'm not sure if that's legal: https://github.com/EnzymeAD/rust/blob/322f2226c1f672c9b5e934b15d255ae0d66bd0e2/compiler/rustc_middle/src/ty/typetree.rs#L196 If you say it's too hard for now, I could merge a workaround which analyzes the types in the ast frontend, which wouldn't support aliases or generics, but at least could handle Also, there are reasons due to which a user might specify a larger stride than what I'd compute by default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic code will require a lot of extra work anyway (we'll need to find the right trait bounds and such so that a generic function will never fail to monomorphize)
If my old trick still work (and I don't see why not) it's trivial to add, as long as we don't move to the rustc_intrinsic which we discussed. A dummy function calls the source function in a black-boxed call. In the past it used to do this with the same generics, in the latest rework from proc-macros to builtin_macros however I didn't implement it anymore. We could re-add it though. |
ce3ab30
to
ae6247c
Compare
This comment has been minimized.
This comment has been minimized.
ae6247c
to
d7c0c32
Compare
@bors r+ rollup |
…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#139351 (Autodiff batching2) - rust-lang#139483 (f*::NAN: guarantee that this is a quiet NaN) - rust-lang#139498 (Ignore zero-sized types in wasm future-compat warning) - rust-lang#139967 (Introduce and use specialized `//@ ignore-auxiliary` for test support files instead of using `//@ ignore-test`) - rust-lang#139969 (update libc) - rust-lang#139971 (Make C string merging test work on MIPS) - rust-lang#139974 (Change `InterpCx::instantiate*` function visibility to pub) - rust-lang#139977 (Fix drop handling in `hint::select_unpredictable`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#139351 - EnzymeAD:autodiff-batching2, r=oli-obk Autodiff batching2 ~I will rebase it once my first PR landed.~ done. This autodiff batch mode is more similar to scalar autodiff, since it still only takes one shadow argument. However, that argument is supposed to be `width` times larger. r? `@oli-obk` Tracking: - rust-lang#124509
I will rebase it once my first PR landed.done.This autodiff batch mode is more similar to scalar autodiff, since it still only takes one shadow argument.
However, that argument is supposed to be
width
times larger.r? @oli-obk
Tracking: