-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[AutoDiff] Bump-pointer allocate pullback structs in loops. #34886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a5e9082
to
4a19baa
Compare
@swift-ci please test |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a nice holiday gift! Incredible.
Can we start autodiff benchmarks to evaluate performance-impacting compiler changes? Plugging into Swift compiler benchmark suite@marcrasi added autodiff tests to the benchmark suite, I think reviving and landing that is a good start: #31108. Dedicated Swift differentiation benchmark libraryPersonally, I like google/swift-benchmark as a benchmarking library. I found it gives more information than XCTest utilities ( I used google/swift-benchmark for various autodiff benchmarking experiments but didn't know a good home for the code (an It's a bit harder to test compiler changes with a SwiftPM benchmark suite, but it should be possible with a bit of work. |
I'm interested in using the Swift compiler benchmark suite. Thanks for pointing to #31108 — I'll revive this. From my local benchmarks, I've seen a consistent 2x-10x speedup on loops, but it's small compared to the rest of the issues to be fixed later. A more significant outcome is that loops over 1 million iterations no longer segfault. |
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
4a19baa
to
7d81ad8
Compare
@swift-ci please test |
* 'main' of github.com:apple/swift: (67 commits) [build-script] Allow to tune dsymutil parallelism (swiftlang#34795) [Testing] Add missing REQUIRES [concurrency] SILGen: emit @asyncHandler functions. [concurrency] SILGen: allow the Builtin.createAsyncTaskFuture to have a non-generic closure argument. [concurrency] stdlib: add a _runAsyncHandler compiler intrinsic. Mangling: add support for mangling the body-function of asyncHandlers Make sure ~AutoDiffLinearMapContext() is called. fix SourceLoc-related crasher and add tests [AutoDiff] Bump-pointer allocate pullback structs in loops. (swiftlang#34886) update differentiable programming manifesto [Async CC] Always add full type metadata to bindings. [cxx-interop] Fix assertion to allow variadic members. [ome] Remove bad pattern of having a global SILBuilder with a global SILBuilderWithContext and multiple local SILBuilderWithScope. [ome] Invoke simplifyInstruction after lowering ownership and use replaceAllSimplifiedUsesAndErase instead of a manual RAUW. Partially revert Float16 availability changes (swiftlang#34847) Add a field reflection function that constructs keypaths. (swiftlang#34815) Allow the creation of a shadow variable when the type is a refcounted pointer (swiftlang#34835) [CMake] Extend copy-legacy-layouts dependency to swiftmodules (swiftlang#34846) [sil] Remove usage from TypeLowering of SILBuilder::create*AndFold(). [allocbox-to-stack] Fix an ossa bug in PromotedParamCloner. ...
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a
Builtin.NativeObject
, which contains aswift::AutoDiffLinearMapContext
and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated byswift::AutoDiffLinearMapContext::allocate
and stored as aBuiltin.RawPointer
.The following entry points are added to the runtime:
This is paving the road for a series of optimizations on linear map closure context allocations. For example, a pass can be run on all user-registered derivatives to allocate their closure contexts as a subcontext in a
swift::AutoDiffLinearMapContext
.As a result, differentiating loops over 1 million iterations no longer segfaults, and derivatives with loops have a consistent small performance increase. More work to be done later:
llvm::BumpPtrAllcoator
with something like the task allocator in [concurrency] Implement the Task allocator as bump-pointer allocator. #34880 so that we can deallocate things in a stack discipline and tail-allocate the initial slab.