Skip to content

Commit e5935fb

Browse files
committed
[Docs] [AutoDiff] Move higher-order functions out of future directions.
Differentiating higher-order functions is no longer a future direction, as the theory paper ([The Differentiable Curry](https://www.semanticscholar.org/paper/The-Differentiable-Curry-Plotkin-Brain/187078bfb159c78cc8c78c3bbe81a9176b3a6e02)) is now public.
1 parent c8d2ec2 commit e5935fb

File tree

1 file changed

+37
-25
lines changed

1 file changed

+37
-25
lines changed

docs/DifferentiableProgramming.md

+37-25
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Backticks were added manually.
7272
* [Upcasting to non-`@differentiable` functions](#upcasting-to-non-differentiable-functions)
7373
* [Implied generic constraints](#implied-generic-constraints)
7474
* [Non-differentiable parameters](#non-differentiable-parameters)
75+
* [Higher-order functions and currying](#higher-order-functions-and-currying)
7576
* [Differential operators](#differential-operators)
7677
* [Differential-producing differential operators](#differential-producing-differential-operators)
7778
* [Pullback-producing differential operators](#pullback-producing-differential-operators)
@@ -88,7 +89,6 @@ Backticks were added manually.
8889
* [Convolutional neural networks (CNN)](#convolutional-neural-networks-cnn)
8990
* [Recurrent neural networks (RNN)](#recurrent-neural-networks-rnn)
9091
* [Future directions](#future-directions)
91-
* [Differentiation of higher-order functions](#differentiation-of-higher-order-functions)
9292
* [Higher-order differentiation](#higher-order-differentiation)
9393
* [Naming conventions for numerical computing](#naming-conventions-for-numerical-computing)
9494
* [Source compatibility](#source-compatibility)
@@ -2002,6 +2002,42 @@ _ = f0 as @differentiable (@noDerivative Float, Float) -> Float
20022002
_ = f0 as @differentiable (@noDerivative Float, @noDerivative Float) -> Float
20032003
```
20042004

2005+
#### Higher-order functions and currying
2006+
2007+
As defined above, the `@differentiable` function type attributes requires all
2008+
non-`@noDerivative` arguments and results to conform to the `@differentiable`
2009+
attribute. However, there is one exception: when the type of an argument or
2010+
result is a function type, e.g. `@differentiable (T) -> @differentiable (U) ->
2011+
V`. This is because we need to differentiate higher-order funtions.
2012+
2013+
Mathematically, the differentiability of `@differentiable (T, U) -> V` is
2014+
similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
2015+
differentiating either one will provide derivatives with respect to parameters
2016+
`T` and `U`. Here are some examples of first-order function types and their
2017+
corresponding curried function types:
2018+
2019+
| First-order function type | Curried function type |
2020+
| @differentiable (T, U) -> V | @differentiable (T) -> @differentiable (U) -> V |
2021+
| @differentiable (T, @noDerivative U) -> V | @differentiable (T) -> (U) -> V |
2022+
| @differentiable (@noDerivative T, U) -> V | (T) -> @differentiable (U) -> V |
2023+
2024+
A curried differentiable function can be formed like any curried
2025+
non-differentiable function in Swift.
2026+
2027+
```swift
2028+
func curry<T, U, V>(
2029+
_ f: @differentiable (T, U) -> V
2030+
) -> @differentiable (T) -> @differentiable (U) -> V {
2031+
{ x in { y in f(x, y) } }
2032+
}
2033+
```
2034+
2035+
The way this works is that the compiler internally assigns a tangent bundle to a
2036+
closure that captures variables. This tangent bundle is existentially typed,
2037+
because closure contexts are type-erased in Swift. The theory behind the typing
2038+
rules has been published as [The Differentiable
2039+
Curry](https://www.semanticscholar.org/paper/The-Differentiable-Curry-Plotkin-Brain/187078bfb159c78cc8c78c3bbe81a9176b3a6e02).
2040+
20052041
### Differential operators
20062042

20072043
The core differentiation APIs are the differential operators. Differential
@@ -2456,30 +2492,6 @@ typealias LSTM<Scalar: TensorFlowFloatingPoint> = RNN<LSTMCell<Scalar>>
24562492

24572493
## Future directions
24582494

2459-
### Differentiation of higher-order functions
2460-
2461-
Mathematically, the differentiability of `@differentiable (T, U) -> V` is
2462-
similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
2463-
differentiating either one will provide derivatives with respect to parameters
2464-
`T` and `U`.
2465-
2466-
To form a `@differentiable (T) -> @differentiable (U) -> V`, the most natural
2467-
thing to do is currying, which one might implement as:
2468-
2469-
```swift
2470-
func curry<T, U, V>(
2471-
_ f: @differentiable (T, U) -> V
2472-
) -> @differentiable (T) -> @differentiable (U) -> V {
2473-
{ x in { y in f(x, y) } }
2474-
}
2475-
```
2476-
2477-
However, the compiler does not support currying today due to known
2478-
type-theoretical constraints and implementation complexity regarding
2479-
differentiating a closure with respect to the values it captures. Fortunately,
2480-
we have a formally proven solution in the works, but we would like to defer this
2481-
to a future proposal since it is purely additive to the existing semantics.
2482-
24832495
### Higher-order differentiation
24842496

24852497
Distinct from differentiation of higher-order functions, higher-order

0 commit comments

Comments
 (0)