Skip to content

[AutoDiff] Bump-pointer allocate pullback structs in loops. #34886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 30, 2020

Conversation

rxwei
Copy link
Contributor

@rxwei rxwei commented Nov 30, 2020

In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.

When a function contains a differentiated loop, the closure context is a Builtin.NativeObject, which contains a swift::AutoDiffLinearMapContext and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by swift::AutoDiffLinearMapContext::allocate and stored as a Builtin.RawPointer.

The following entry points are added to the runtime:

/// Creates a linear map context with a tail-allocated top-level subcontext.
SWIFT_EXPORT_FROM(swift_Differentiation) SWIFT_CC(swift)
AutoDiffLinearMapContext *swift_autoDiffCreateLinearMapContext(
    size_t topLevelSubcontextSize);

/// Returns the address of the tail-allocated top-level subcontext.
SWIFT_EXPORT_FROM(swift_Differentiation) SWIFT_CC(swift)
void *swift_autoDiffProjectTopLevelSubcontext(AutoDiffLinearMapContext *);

/// Allocates memory for a new subcontext.
SWIFT_EXPORT_FROM(swift_Differentiation) SWIFT_CC(swift)
void *swift_autoDiffAllocateSubcontext(AutoDiffLinearMapContext *, size_t size);

This is paving the road for a series of optimizations on linear map closure context allocations. For example, a pass can be run on all user-registered derivatives to allocate their closure contexts as a subcontext in a swift::AutoDiffLinearMapContext.

As a result, differentiating loops over 1 million iterations no longer segfaults, and derivatives with loops have a consistent small performance increase. More work to be done later:

@rxwei rxwei force-pushed the ad-loop-optimization branch 2 times, most recently from a5e9082 to 4a19baa Compare November 30, 2020 07:03
@rxwei
Copy link
Contributor Author

rxwei commented Nov 30, 2020

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 4a19baaa4e66b1f34275d16f11acb225658cec4d

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a nice holiday gift! Incredible.

@dan-zheng
Copy link
Contributor

dan-zheng commented Nov 30, 2020

Can we start autodiff benchmarks to evaluate performance-impacting compiler changes?

Plugging into Swift compiler benchmark suite

@marcrasi added autodiff tests to the benchmark suite, I think reviving and landing that is a good start: #31108.

Dedicated Swift differentiation benchmark library

Personally, I like google/swift-benchmark as a benchmarking library. I found it gives more information than XCTest utilities (XCTestCase.measure) and the output is quite human-interpretable.

I used google/swift-benchmark for various autodiff benchmarking experiments but didn't know a good home for the code (an AutoDiffBenchmarks SwiftPM package). Maybe it could be a GitHub repository called swift-autodiff-benchmarking.

It's a bit harder to test compiler changes with a SwiftPM benchmark suite, but it should be possible with a bit of work.

@rxwei
Copy link
Contributor Author

rxwei commented Nov 30, 2020

I'm interested in using the Swift compiler benchmark suite. Thanks for pointing to #31108 — I'll revive this.

From my local benchmarks, I've seen a consistent 2x-10x speedup on loops, but it's small compared to the rest of the issues to be fixed later. A more significant outcome is that loops over 1 million iterations no longer segfault.

In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
@rxwei rxwei force-pushed the ad-loop-optimization branch from 4a19baa to 7d81ad8 Compare November 30, 2020 20:11
@rxwei
Copy link
Contributor Author

rxwei commented Nov 30, 2020

@swift-ci please test

@rxwei rxwei merged commit de2dbe5 into swiftlang:main Nov 30, 2020
@rxwei rxwei deleted the ad-loop-optimization branch November 30, 2020 23:49
ainu-bot pushed a commit to google/swift that referenced this pull request Dec 1, 2020
* 'main' of github.com:apple/swift: (67 commits)
  [build-script] Allow to tune dsymutil parallelism (swiftlang#34795)
  [Testing] Add missing REQUIRES
  [concurrency] SILGen: emit @asyncHandler functions.
  [concurrency] SILGen: allow the Builtin.createAsyncTaskFuture to have a non-generic closure argument.
  [concurrency] stdlib: add a _runAsyncHandler compiler intrinsic.
  Mangling: add support for mangling the body-function of asyncHandlers
  Make sure ~AutoDiffLinearMapContext() is called.
  fix SourceLoc-related crasher and add tests
  [AutoDiff] Bump-pointer allocate pullback structs in loops. (swiftlang#34886)
  update differentiable programming manifesto
  [Async CC] Always add full type metadata to bindings.
  [cxx-interop] Fix assertion to allow variadic members.
  [ome] Remove bad pattern of having a global SILBuilder with a global SILBuilderWithContext and multiple local SILBuilderWithScope.
  [ome] Invoke simplifyInstruction after lowering ownership and use replaceAllSimplifiedUsesAndErase instead of a manual RAUW.
  Partially revert Float16 availability changes (swiftlang#34847)
  Add a field reflection function that constructs keypaths. (swiftlang#34815)
  Allow the creation of a shadow variable when the type is a refcounted pointer (swiftlang#34835)
  [CMake] Extend copy-legacy-layouts dependency to swiftmodules (swiftlang#34846)
  [sil] Remove usage from TypeLowering of SILBuilder::create*AndFold().
  [allocbox-to-stack] Fix an ossa bug in PromotedParamCloner.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants