@@ -72,6 +72,7 @@ Backticks were added manually.
72
72
* [ Upcasting to non-` @differentiable ` functions] ( #upcasting-to-non-differentiable-functions )
73
73
* [ Implied generic constraints] ( #implied-generic-constraints )
74
74
* [ Non-differentiable parameters] ( #non-differentiable-parameters )
75
+ * [ Higher-order functions and currying] ( #higher-order-functions-and-currying )
75
76
* [ Differential operators] ( #differential-operators )
76
77
* [ Differential-producing differential operators] ( #differential-producing-differential-operators )
77
78
* [ Pullback-producing differential operators] ( #pullback-producing-differential-operators )
@@ -88,7 +89,6 @@ Backticks were added manually.
88
89
* [ Convolutional neural networks (CNN)] ( #convolutional-neural-networks-cnn )
89
90
* [ Recurrent neural networks (RNN)] ( #recurrent-neural-networks-rnn )
90
91
* [ Future directions] ( #future-directions )
91
- * [ Differentiation of higher-order functions] ( #differentiation-of-higher-order-functions )
92
92
* [ Higher-order differentiation] ( #higher-order-differentiation )
93
93
* [ Naming conventions for numerical computing] ( #naming-conventions-for-numerical-computing )
94
94
* [ Source compatibility] ( #source-compatibility )
@@ -2002,6 +2002,42 @@ _ = f0 as @differentiable (@noDerivative Float, Float) -> Float
2002
2002
_ = f0 as @differentiable (@noDerivative Float , @noDerivative Float ) -> Float
2003
2003
```
2004
2004
2005
+ #### Higher-order functions and currying
2006
+
2007
+ As defined above, the ` @differentiable ` function type attributes requires all
2008
+ non-` @noDerivative ` arguments and results to conform to the ` @differentiable `
2009
+ attribute. However, there is one exception: when the type of an argument or
2010
+ result is a function type, e.g. `@differentiable (T) -> @differentiable (U) ->
2011
+ V`. This is because we need to differentiate higher-order funtions.
2012
+
2013
+ Mathematically, the differentiability of ` @differentiable (T, U) -> V ` is
2014
+ similar to that of ` @differentiable (T) -> @differentiable (U) -> V ` in that
2015
+ differentiating either one will provide derivatives with respect to parameters
2016
+ ` T ` and ` U ` . Here are some examples of first-order function types and their
2017
+ corresponding curried function types:
2018
+
2019
+ | First-order function type | Curried function type |
2020
+ | @differentiable (T, U) -> V | @differentiable (T) -> @differentiable (U) -> V |
2021
+ | @differentiable (T, @noDerivative U) -> V | @differentiable (T) -> (U) -> V |
2022
+ | @differentiable (@noDerivative T, U) -> V | (T) -> @differentiable (U) -> V |
2023
+
2024
+ A curried differentiable function can be formed like any curried
2025
+ non-differentiable function in Swift.
2026
+
2027
+ ``` swift
2028
+ func curry <T , U , V >(
2029
+ _ f : @differentiable (T, U) -> V
2030
+ ) -> @differentiable (T) -> @differentiable (U) -> V {
2031
+ { x in { y in f (x, y) } }
2032
+ }
2033
+ ```
2034
+
2035
+ The way this works is that the compiler internally assigns a tangent bundle to a
2036
+ closure that captures variables. This tangent bundle is existentially typed,
2037
+ because closure contexts are type-erased in Swift. The theory behind the typing
2038
+ rules has been published as [ The Differentiable
2039
+ Curry] ( https://www.semanticscholar.org/paper/The-Differentiable-Curry-Plotkin-Brain/187078bfb159c78cc8c78c3bbe81a9176b3a6e02 ) .
2040
+
2005
2041
### Differential operators
2006
2042
2007
2043
The core differentiation APIs are the differential operators. Differential
@@ -2456,30 +2492,6 @@ typealias LSTM<Scalar: TensorFlowFloatingPoint> = RNN<LSTMCell<Scalar>>
2456
2492
2457
2493
## Future directions
2458
2494
2459
- ### Differentiation of higher- order functions
2460
-
2461
- Mathematically, the differentiability of `@differentiable (T, U) -> V` is
2462
- similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
2463
- differentiating either one will provide derivatives with respect to parameters
2464
- `T` and `U`.
2465
-
2466
- To form a `@differentiable (T) -> @differentiable (U) -> V`, the most natural
2467
- thing to do is currying, which one might implement as :
2468
-
2469
- ```swift
2470
- func curry< T, U, V> (
2471
- _ f: @differentiable (T, U) -> V
2472
- ) -> @differentiable (T) -> @differentiable (U) -> V {
2473
- { x in { y in f (x, y) } }
2474
- }
2475
- ```
2476
-
2477
- However, the compiler does not support currying today due to known
2478
- type- theoretical constraints and implementation complexity regarding
2479
- differentiating a closure with respect to the values it captures. Fortunately,
2480
- we have a formally proven solution in the works, but we would like to defer this
2481
- to a future proposal since it is purely additive to the existing semantics.
2482
-
2483
2495
### Higher- order differentiation
2484
2496
2485
2497
Distinct from differentiation of higher- order functions, higher- order
0 commit comments