[Type Refactor] Use Java type system instead of custom one for typing tensors #174

karllessard · 2020-12-22T03:16:56Z

This PR is the last of importance in the tensor type refactoring currently in progress and it has a lot of backward incompatible changes.

We drop the current DataType custom class to leverage the standard Java type system instead, using .class. So now, each of the tensor types (i.e. subinterfaces of TType, like TFloat) carries itself the information required for allocating and mapping tensors of this type, via the annotation @TensorType.

All reference to the static variable DTYPE of the different tensor types can now be replaced by .class. For example:

// Before
Operand<TFloat32> x = tf.dtypes.cast(tf.constant(1), TFloat32.DTYPE);
assertSame(x.dataType(), TFloat32.DTYPE);
if (x.dataType() == TFloat32.DTYPE || x.dataType() == TFloat64.DTYPE) {
    // floating-point only computations...
}

// After
Operand<TFloat32> x = tf.dtypes.cast(tf.constant(1), TFloat32.class); 
assertSame(x.type(), TFloat32.class);
if (TFloating.class.isAssignableFrom(x.type()) {
    // floating-point only computations...
}

In most places where the custom DataType class has been dropped, it has been replaced with its protobuf equivalent of the same name which consists of a simple enum. This provides type information at low level from the runtime library directly, and the API of the Op framework will naturally guide the users to favor the usage of the tensor type classes (e.g. TFloat32.class) instead.

TFloat32 t = TFloat32.scalarOf(1.0f);
assertSame(t.type(), TFloat32.class);
assertSame(t.dataType(), DataType.DT_FLOAT);

In a nut shell, the PR consists of:

Dropping custom DataType and DataTypes classes to use Java Class instead
Adding annotation and other utilities to register the tensor type classes (subinterfaces of TType)
Replace all reference to T*.DTYPE by T*.class
Reshuffle a little bit some functions of the framework to leverage the new type system
A lot of changes in generated files that could be skipped during the review (everything under tensorflow-core/tensorflow-core-api/src/gen)

It is important to note that after this PR has been merged, the ongoing work from @rnett (Kotlin) and @JimClarke5 (Framework) could be resumed.

CC\ @deansher , @Craigacp

Craigacp

Would be nice to clean up the import generation, as that's induced a lot of noise in this PR, but it doesn't have to be done here. I think that the framework module could do with a pass to enforce stricter types now TFloating and TIntegral exist, but again that doesn't have to happen here.

tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/bitwise/BitwiseOr.java

...orflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/collective/BroadcastRecv.java

Craigacp · 2020-12-22T15:25:22Z

...flow-core-api/src/gen/java/org/tensorflow/op/data/experimental/AssertCardinalityDataset.java

@@ -46,16 +47,12 @@
   * @return a new instance of AssertCardinalityDataset
   */
  @Endpoint(describeByClass = true)
-  public static AssertCardinalityDataset create(Scope scope, Operand<?> inputDataset, Operand<TInt64> cardinality, List<DataType<?>> outputTypes, List<Shape> outputShapes) {
+  public static AssertCardinalityDataset create(Scope scope, Operand<?> inputDataset, Operand<TInt64> cardinality, List<Class<? extends TType>> outputTypes, List<Shape> outputShapes) {


These seem to be the same as the classes in org.tensorflow.op.data?

I don't see a AssertCardinalityDataset class under org.tensorflow.op.data, is it what you meant?

Ooops, sorry I meant AssertNextDataset. There are a bunch of classes that are in both org.tensorflow.op.data and org.tensorflow.op.data.experimental.

These are two different kernels in TensorFlow runtime, like other duplicates found in these package as far as I can see. For example, the op under data is using AssertNextDataset while the other under experimental is using ExperimentalAssertNextDataset. So it is correct that they coexist, probably they achieve the same purpose but using different implementations. See here

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Output.java

tensorflow-framework/src/main/java/org/tensorflow/framework/initializers/LeCun.java

tensorflow-framework/src/main/java/org/tensorflow/framework/initializers/TruncatedNormal.java

tensorflow-framework/src/main/java/org/tensorflow/framework/initializers/RandomUniform.java

tensorflow-framework/src/main/java/org/tensorflow/framework/initializers/RandomNormal.java

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Hinge.java

karllessard · 2020-12-22T20:22:20Z

Thanks a lot for the quick review @Craigacp !

JimClarke5 · 2020-12-22T23:34:16Z

In the TestSession classes, there is still a lot of checking on the types to call the right method for evaluating and printing. For example compare a float value to TFLoat32 data, double value to TFLoat64 data, etc. Is there now a better way to do this?

karllessard · 2020-12-23T17:01:21Z

In the TestSession classes, there is still a lot of checking on the types to call the right method for evaluating and printing. For example compare a float value to TFLoat32 data, double value to TFLoat64 data, etc. Is there now a better way to do this?

I don't think this PR is related to such case but an idea like that, since you know that the supported types are all numeric values, you can probably just cast the boxed value returned by getObject() to a Number and then use the doubleValue of that number?

For example, where expected is a double value,

o.asTensor().scalars().forEach(f -> assertEquals(expected, ((Number)f.getObject()).doubleValue()));

JimClarke5 · 2020-12-23T17:41:12Z

Let me play with your suggestion. I am redoing the TestSession classes to add support for Placeholders and Feeds, so now would be a good time to clean it up.

JimClarke5 · 2020-12-24T12:27:26Z

The labels are restricted to the values -1, 0 and 1. So restricting it to integers seems appropriate. Jim

…

On Dec 23, 2020, at 11:15 PM, Karl Lessard ***@***.***> wrote: @karllessard commented on this pull request. In tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Hinge.java: > @@ -124,16 +124,13 @@ public Hinge(Ops tf, String name, Reduction reduction) { public <T extends TNumber, U extends TNumber> Operand<T> call( Operand<U> labels, Operand<T> predictions, Operand<T> sampleWeights) { Just to make sure I understood @JimClarke5 , so labels could be any numeric values while predictions and sampleWeights must be restricted to floating-points only in all three losses you've mentioned? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

karllessard · 2020-12-24T23:06:34Z

Ok so I've pushed the last changes that, I think, covers up comments from @Craigacp and @JimClarke5 .

@deansher , are you interested to take a look as well, as you were a major instigator of that proposal as well? If you want but don't have time right now, just let me know and I'll wait before merging, thanks!

deansher · 2020-12-25T18:58:28Z

Thank you, Karl — I would love to take a look, if waiting until early New Year's week to merge doesn't seem like a drag. Dean

…

On Thu, Dec 24, 2020 at 6:06 PM Karl Lessard ***@***.***> wrote: Ok so I've pushed the last changes that, I think, covers up comments from @Craigacp <https://github.com/Craigacp> and @JimClarke5 <https://github.com/JimClarke5> . @deansher <https://github.com/deansher> , are you interested to take a look as well, as you were a major instigator of that proposal as well? If you want but don't have time right now, just let me know and I'll wait before merging, thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#174 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZ7X2JPU6NOVOIHIWQRVDSWPCQNANCNFSM4VFBJWLA> .

karllessard · 2020-12-26T22:20:01Z

@rnett , @JimClarke5 : I don't expect that there will be major changes after Dean's review so if you want to get unblocked faster, I invite you to rebase your work on this PR branch and you can even start producing new PRs that we could merge after this one.

rnett · 2020-12-28T00:50:34Z

I'm working on the reified generation for the Kotlin API, and I'm noticing that lots of methods have unnecessary type parameters that makes the reified usage much less nice (since you have to specify all type parameters if you specify one). The best example is probably cast, which has the signiture:

<U extends TType, T extends TType> Cast<U> cast(Operand<T> x, Class<U> DstT, Cast.Options... options)

T is completely unnecessary and could be replaced with ? without issue, but it prevents cast<TInx32>(x) usage from Kotlin. This shows up in a number of Ops, mostly with the unnecessary type parameters on the input. It essentially needs a "is this type param only bounded by TType and only used on inputs" check.

It might be out of scope of this PR to change this now, since it probably is best done in the cc generation code (to change the op class create methods).

karllessard · 2020-12-28T16:19:14Z

That's very interesting @rnett , and yes it should be addressed outside this PR, I've created a new issue so we can continue the discussion from there: #176

deansher

I love how this came out! I did propose changes, but they are nits.

Thanks, @karllessard , for inviting me back into the loop on this. It's exciting to see it land! What a lot of work!

deansher · 2020-12-29T15:08:31Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/RawTensor.java

-    // Minimum requirements for datatypes of variable length cannot be verified in a relevant way so
-    // we only validate them for fixed length datatypes
-    if (!dtype.isVariableLength() && shape.size() * dtype.byteSize() > size) {
+  static RawTensor allocate(Class<? extends TType> type, Shape shape, long size) {


Augment this method to handle a shape of UNKNOWN_SIZE? (The previous version accidentally handled it.)

In fact, passing a totally or partially unknown shape to this constructor should be forbidden, I'll add a check for this.

deansher · 2020-12-29T15:33:32Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java

   * @param shape shape of the tensor
-   * @param size size, in bytes, of the tensor
+   * @param size size in bytes of the tensor or -1 to compute the size from the shape


Document the behavior when shape has UNKNOWN_SIZE and -1 is passed for size?

same as in #174 (comment)

deansher · 2020-12-29T15:35:40Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java

  }

  /**
   * Allocates a tensor of a given datatype, shape and size.
   *
-   * <p>This method is identical to {@link #of(DataType, Shape, Consumer)}, except that the final
+   * <p>This method is identical to {@link #of(Class, Shape, Consumer)}, except that the final
   * size for the tensor is explicitly set instead of being computed from the datatype and shape.


Given the new support for size of -1, perhaps change to "can be explicitly set".

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/RawTensor.java

deansher · 2020-12-29T18:49:45Z

...tensorflow-core-api/src/main/java/org/tensorflow/internal/types/registry/TensorTypeInfo.java

+  /**
+   * Returns the class of this tensor type
+   */
+  public Class<T> typeClass() {


Elsewhere (such as in Tensor), we simply call this type(). I do feel the emotional tug to be more explicit in this case, but I wonder if it will simply feel like inconsistency once we have lived with the new paradigm for a while.

I was hesitating on this one as well so sounds like we are two now, meaning that it should be type(), I'll rename it.

deansher · 2020-12-29T18:56:26Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/op/core/Shapes.java

@@ -83,7 +79,7 @@
   */
  @Endpoint(name = "flatten")
  public static <T extends TType, U extends TNumber> Operand<T> flatten(
-      Scope scope, Operand<T> operand, DataType<U> dType) {
+      Scope scope, Operand<T> operand, Class<U> dType) {


dType -> type throughout this file

deansher · 2020-12-29T19:00:20Z

...re/tensorflow-core-api/src/main/java/org/tensorflow/op/nn/SoftmaxCrossEntropyWithLogits.java

-      return Cast.create(scope, result, logits.asOutput().dataType());
-    } else if(!logits.asOutput().dataType().equals(labels.asOutput().dataType())) {
+      return Cast.create(scope, result, logits.asOutput().type());
+    } else if(!logits.asOutput().type().equals(labels.asOutput().type())) {


!= would be more consistent.

deansher · 2020-12-29T19:01:40Z

...re/tensorflow-core-api/src/main/java/org/tensorflow/op/nn/SoftmaxCrossEntropyWithLogits.java

@@ -197,8 +192,8 @@
   */
  private static <T extends TNumber, U extends TNumber> Operand<T> moveDimToEnd(
      Scope scope, Operand<T> input, int dimIndex, Operand<U> rank) {
-    DataType<? extends TNumber> rankDType = rank.asOutput().dataType();
-    Operand one = Cast.create(scope, Constant.scalarOf(scope, 1), rankDType);
+    Class<U> rankDType = rank.asOutput().type();


rankDType -> rankType

deansher · 2020-12-29T19:23:32Z

tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/RawTensorTest.java

+    TFloat32 floatTensor = rawTensor.asTypedTensor();
+    assertSame(floatTensor.asRawTensor(), rawTensor);
+    try {
+      TInt32 intTensor = rawTensor.asTypedTensor();


This test code throws the expected exception because of the assignment to intTensor, rather than in asTypedTensor(). Here is alternative test code that also passes, but demonstrates that asTypedTensor() doesn't have its documented behavior:

@Test public void rawToTypedTensor() { RawTensor rawTensor = RawTensor.allocate(TFloat32.class, Shape.of(2, 2), -1); TFloat32 floatTensor = rawTensor.asTypedTensor(); assertSame(floatTensor.asRawTensor(), rawTensor); Object objTensor = rawTensor.<TInt32>asTypedTensor(); try { TInt32 intTensor = (TInt32) objTensor; fail(); } catch (ClassCastException e) { // ok } }

Good point, I've never tried that before. You want me to update the doc or try to throw an ClassCastException in this case as well?

Note that asTypedTensor() is an internal package-private method and where it is currently being used, it is implicitly enforced that the returned type matches the type of the tensor for all cases.

Updating the doc seems good to me for both of the reasons you give.

To be honest I'm not too sure how to document this. It looks to me that this behavior is some kind of "defect" in the generic specification, since the explicit <TInt32> parameterization of the method invocation seems to be overridden implicitly to the type inferred by the target (Object in this case).

So basically, <TInt32> is ignored and that is probably the case for Java type inference in general when dealing with type parameters inferred by a target, is it worth documenting it here then?

The type will be erased at runtime so the cast is erased to its bound (TType) which always succeeds. It might be better to have it explicitly return TType rather than have it promise something that can't be enforced by the type system. This is going to pollute the internals of our code with casts, but at least we'll remember to put them in rather than having it mysteriously blow up.

Ok sounds fair, let's do that, anyway all these additional casts will happen internally only.

Update: ended up that only one additional cast was required in the source code...

On further reflection, I lean the same way as @Craigacp: have this method return TType.

Here's some analysis to support this choice: At runtime, the current behavior is "apply the mapper from typeInfo and return whatever TType it produces." Since the type parameter T is erased at runtime, the only way to guarantee a return of T would be to have the method take the tensor type class as a runtime parameter and explicitly verify that the mapper returns that type class. I might advocate this approach for a public method or even a widely used package-private method, but not for a rarely used package-private method.

But also, a review of the call points of this method raises another issue: whenever we treat the return value of asTypedTensor as simply a Tensor (rather than a TType), we "forget" at compile time the semantic upgrade that was presumably the point of the asTypedTensor call in the first place.

As an example, let's explore what happens underneath the following method:

class EagerOperation extends AbstractOperation { // ... /** * Returns the tensor of the {@code outputIdx}th output of this operation. * * <p>This is only supported in an eager execution environment. * * @param outputIdx index of the output of this operation * @return output tensor */ @Override Tensor tensor(int outputIndex) { Tensor tensor = outputTensors.get(outputIndex); if (tensor == null) { tensor = resolveTensor(outputIndex); } return tensor; } // ... }

(I copied the Javadoc above the @Override for easy reference.)

The caller should presumably be agnostic as to whether this returns a RawTensor or a TType. But looking deeper into the call path, here's how the returned Tensor is constructed:

private static Tensor resolveTensorHandle(TFE_TensorHandle handle, EagerSession session) { requireTensorHandle(handle); try (PointerScope scope = new PointerScope()) { TF_Status status = TF_Status.newStatus(); TF_Tensor tensor = TFE_TensorHandleResolve(handle, status).withDeallocator(); status.throwExceptionIfNotOK(); return RawTensor.fromHandle(tensor, session).asTypedTensor(); } }

What's the point of asTypedTensor() on the last substantive line, above? It causes this method to return a special kind of Tensor -- a TType -- but that fact is immediately forgotten by the type system and is not even asserted in the Javadoc. Presumably, we should either drop the call to .asTypedTensor() or change the whole call path to return TType.

:-) But not for this PR!

deansher · 2020-12-29T19:26:43Z

tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/RawTensorTest.java

+  }
+
+  @Test
+  public void allocateTensorWithoutSize() {


Add test to show intended behavior of shape with UNKNOWN_SIZE.

karllessard · 2020-12-30T16:53:57Z

I've just pushed a last version that should now covered all discussed topics during the review, I'll merge it as soon as I receive a green light from your behalf and we will be done with that massive refactoring (and hopefully the last one of this nature).

deansher

The Eagle has landed.

JimClarke5

Good job @karllessard

Craigacp

The typing in TensorTypeRegistry should probably be relaxed as I've indicated, but that can wait till the next PR.

Craigacp · 2020-12-30T19:57:01Z

...orflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/collective/BroadcastRecv.java

    OperationBuilder opBuilder = scope.env().opBuilder("CollectiveBcastRecv", scope.makeOpName("BroadcastRecv"));
    opBuilder = scope.apply(opBuilder);
-    opBuilder.setAttr("T", T);
+    opBuilder.setAttr("T", Operands.toDataType(T));


At some point we should fix the generator so it doesn't shadow the type name with a variable name, that's just confusing.

Craigacp · 2020-12-30T20:04:45Z

...orflow-core-api/src/main/java/org/tensorflow/internal/types/registry/TensorTypeRegistry.java

+   * @return type registered information
+   * @throws IllegalArgumentException if no tensor type for this data type has been registered
+   */
+  public static <T extends TType> TensorTypeInfo<T> find(DataType dataType) {


This is a place where it will infer the type bound from the calling context, and if that's wrong then we'll get a weird class cast error when people use it. It might be better to return the wildcard as at least the user will get a warning when they make the cast.

Again note that the TensorTypeRegistry is (for now) an internal class so we do have some control over the context it is being called. But like you've suggested, let's review this later.

Yeah, I know. But I'm in favour of having the compiler warn me before I make a silly mistake, and 6 months from now I won't necessarily remember that this method doesn't quite live up to its contract.

karllessard · 2020-12-30T22:11:53Z

All right, let's merge this before someone changes his mind :) Thank you all for your reviews and comments!

karllessard requested a review from Craigacp December 22, 2020 03:16

Leverage the Java type system for typing tensors

ba3d471

karllessard force-pushed the type-refactor-class branch from 2a24e5d to ba3d471 Compare December 22, 2020 15:26

karllessard added the CI build label Dec 22, 2020

Craigacp requested changes Dec 22, 2020

View reviewed changes

Cleanup some obsolete imports and comments

71bc78c

JimClarke5 mentioned this pull request Dec 23, 2020

Adding layers (based on keras) supporting multiple outputs #65

Draft

Restrict tensor types on some initializers

3ce9f9a

karllessard mentioned this pull request Dec 28, 2020

Simplify generic parameterization on some operations #176

Closed

deansher suggested changes Dec 29, 2020

View reviewed changes

Document a few exception cases and other cleanups

b2a54aa

deansher approved these changes Dec 30, 2020

View reviewed changes

JimClarke5 approved these changes Dec 30, 2020

View reviewed changes

Craigacp approved these changes Dec 30, 2020

View reviewed changes

karllessard merged commit f85623e into tensorflow:master Dec 30, 2020

JimClarke5 mentioned this pull request Jan 3, 2021

Learning rate #106

Closed

deansher mentioned this pull request Mar 15, 2021

how to implement DataType families? #115

Closed

[Type Refactor] Use Java type system instead of custom one for typing tensors #174

[Type Refactor] Use Java type system instead of custom one for typing tensors #174

Conversation

karllessard commented Dec 22, 2020

Craigacp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard commented Dec 22, 2020

JimClarke5 commented Dec 22, 2020 • edited Loading

karllessard commented Dec 23, 2020

JimClarke5 commented Dec 23, 2020

JimClarke5 commented Dec 24, 2020 via email

karllessard commented Dec 24, 2020

deansher commented Dec 25, 2020 via email

karllessard commented Dec 26, 2020

rnett commented Dec 28, 2020 • edited Loading

karllessard commented Dec 28, 2020

deansher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard Dec 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard commented Dec 30, 2020

deansher left a comment

Choose a reason for hiding this comment

JimClarke5 left a comment

Choose a reason for hiding this comment

Craigacp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard commented Dec 30, 2020

JimClarke5 commented Dec 22, 2020 •

edited

Loading

rnett commented Dec 28, 2020 •

edited

Loading

karllessard Dec 30, 2020 •

edited

Loading