Skip to content

[Docs] [AutoDiff] Move higher-order functions out of future directions. #28695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 11, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 37 additions & 25 deletions docs/DifferentiableProgramming.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Backticks were added manually.
* [Upcasting to non-`@differentiable` functions](#upcasting-to-non-differentiable-functions)
* [Implied generic constraints](#implied-generic-constraints)
* [Non-differentiable parameters](#non-differentiable-parameters)
* [Higher-order functions and currying](#higher-order-functions-and-currying)
* [Differential operators](#differential-operators)
* [Differential-producing differential operators](#differential-producing-differential-operators)
* [Pullback-producing differential operators](#pullback-producing-differential-operators)
Expand All @@ -88,7 +89,6 @@ Backticks were added manually.
* [Convolutional neural networks (CNN)](#convolutional-neural-networks-cnn)
* [Recurrent neural networks (RNN)](#recurrent-neural-networks-rnn)
* [Future directions](#future-directions)
* [Differentiation of higher-order functions](#differentiation-of-higher-order-functions)
* [Higher-order differentiation](#higher-order-differentiation)
* [Naming conventions for numerical computing](#naming-conventions-for-numerical-computing)
* [Source compatibility](#source-compatibility)
Expand Down Expand Up @@ -2002,6 +2002,42 @@ _ = f0 as @differentiable (@noDerivative Float, Float) -> Float
_ = f0 as @differentiable (@noDerivative Float, @noDerivative Float) -> Float
```

#### Higher-order functions and currying

As defined above, the `@differentiable` function type attributes requires all
non-`@noDerivative` arguments and results to conform to the `@differentiable`
attribute. However, there is one exception: when the type of an argument or
result is a function type, e.g. `@differentiable (T) -> @differentiable (U) ->
V`. This is because we need to differentiate higher-order funtions.

Mathematically, the differentiability of `@differentiable (T, U) -> V` is
similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
differentiating either one will provide derivatives with respect to parameters
`T` and `U`. Here are some examples of first-order function types and their
corresponding curried function types:

| First-order function type | Curried function type |
| @differentiable (T, U) -> V | @differentiable (T) -> @differentiable (U) -> V |
| @differentiable (T, @noDerivative U) -> V | @differentiable (T) -> (U) -> V |
| @differentiable (@noDerivative T, U) -> V | (T) -> @differentiable (U) -> V |

A curried differentiable function can be formed like any curried
non-differentiable function in Swift.

```swift
func curry<T, U, V>(
_ f: @differentiable (T, U) -> V
) -> @differentiable (T) -> @differentiable (U) -> V {
{ x in { y in f(x, y) } }
}
```

The way this works is that the compiler internally assigns a tangent bundle to a
closure that captures variables. This tangent bundle is existentially typed,
because closure contexts are type-erased in Swift. The theory behind the typing
rules has been published as [The Differentiable
Curry](https://www.semanticscholar.org/paper/The-Differentiable-Curry-Plotkin-Brain/187078bfb159c78cc8c78c3bbe81a9176b3a6e02).

### Differential operators

The core differentiation APIs are the differential operators. Differential
Expand Down Expand Up @@ -2456,30 +2492,6 @@ typealias LSTM<Scalar: TensorFlowFloatingPoint> = RNN<LSTMCell<Scalar>>

## Future directions

### Differentiation of higher-order functions

Mathematically, the differentiability of `@differentiable (T, U) -> V` is
similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
differentiating either one will provide derivatives with respect to parameters
`T` and `U`.

To form a `@differentiable (T) -> @differentiable (U) -> V`, the most natural
thing to do is currying, which one might implement as:

```swift
func curry<T, U, V>(
_ f: @differentiable (T, U) -> V
) -> @differentiable (T) -> @differentiable (U) -> V {
{ x in { y in f(x, y) } }
}
```

However, the compiler does not support currying today due to known
type-theoretical constraints and implementation complexity regarding
differentiating a closure with respect to the values it captures. Fortunately,
we have a formally proven solution in the works, but we would like to defer this
to a future proposal since it is purely additive to the existing semantics.

### Higher-order differentiation

Distinct from differentiation of higher-order functions, higher-order
Expand Down