Skip to content

Remove ColumnType.RawKind usages Round 1. #2143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 17, 2019

Conversation

eerhardt
Copy link
Member

Remove all usages of RawKind that are outside of ML.Core and ML.Data assemblies. The next round will completely remove ColumnType.RawKind.

Part of the work necessary for #1860 and contributes to #1533.

rawType = type.RawType;

if (rawType == typeof(bool))
dataType = TensorProto.Types.DataType.Float;
Copy link
Member

@sfilipi sfilipi Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float [](start = 54, length = 5)

Unrelated to your changes, I wander if this is correct. @wschin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing when I made the change...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is caused by the limited types accepted by old ONNX operators implemented in RS4. As ONNXRuntime supports much more ops than dark-age implementations, we should revise our type mappings and create more tests. This is a reasonable setting but it can be improved in another PR.


In reply to: 247951442 [](ancestors = 247951442)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have an existing bug to make this better - #1198


default:
throw Contracts.ExceptParam(nameof(kind), $"Unrecognized type '{kind}'");
return null;
Copy link
Member

@sfilipi sfilipi Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null; [](start = 20, length = 12)

curious, why change to return null, rather than throw? If an invalid type gets passed, would it be better to throw? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to preserve the first line's behavior:

case default(DataKind): return null;

When DataKind is default, it usually means it is a "custom" or "structured" type - like VectorType or ImageType. So I was preserving that behavior here - when you see an unrecognized ColumnType - return null.

I assume the existing throw condition was in place for handling invalid enum values - like 29 and -451.


In reply to: 247782795 [](ancestors = 247782795)

string msg = "Unsupported type: DataKind " + rawKind.ToString();
Contracts.Check(false, msg);
break;
string msg = "Unsupported type: " + rawType.ToString();
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawType.ToString() [](start = 52, length = 18)

It would be better to use the actual type ToString here, rather than the .NET type. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Fixed.


In reply to: 248032092 [](ancestors = 248032092)

rawKind = keyType.RawKind;
rawType = vectorType.ItemType.RawType;
else
rawType = type.RawType;
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawType [](start = 31, length = 7)

I am not sure operating over RawType is an appropriate alternative to RawKind, in this case. I might prefer that we operate over the exact ColumnType implementations, since that is more descriptive of what we're trying to do I think.

To give a concrete example, I see the code below is conflating uint as being identical, even thought the same .NET type is used for both raw uints and actual key types... same .NET type, very different information being encoded. This might be a sign that the entire function is screwed up (I see @sfilipi found what appears to be a bug below with booleans, so that's well within the realm of possibility), but I'm a little concerned that this is the first thing I see. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure operating over RawType is an appropriate alternative to RawKind, in this case

I don't think I agree. In this case we are mapping from ML.NET's ColumnType to Onnx's TensorProto.Types.DataType. It doesn't really matter if we are using a KeyType or NumberType.U4. In either case, we need to get a TensorProto.Types.DataType.UInt32. Actually - what would we return for KeyType? We wouldn't want to hard-code it to just TensorProto.Types.DataType.UInt32, would we? What if it was a KeyType that was using UInt16?


In reply to: 248033212 [](ancestors = 248033212)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I see your point, Fair enough thanks @eerhardt.


In reply to: 248109614 [](ancestors = 248109614,248033212)

case BoolType boolType:
return columnType.RawType;
case TextType textType:
return typeof(string);
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this style of change you have here is more what I'd be more comfortable with in the the OnnxUtils case. (Now, in that other case, it would also probably benefit from some when statements to further refine.) #Resolved

@@ -478,7 +478,7 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
{
if (!inputSchema.TryFindColumn(colInfo.Input, out var col))
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);
if ((col.ItemType.ItemType.RawKind == default) || !(col.ItemType.IsVector || col.ItemType is PrimitiveType))
if (!(col.ItemType.IsVector || col.ItemType is PrimitiveType))
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ItemType [](start = 51, length = 8)

I think this is probably going to change anyway with your other change on vectors, but just one thing to keep in mind: While ItemType on ColumnType itself was a ColumnType (which is correct), ItemType on the vector type itself is and always has been a PrimitiveType anyway. So this test here seems at least odd, even in its refined form. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a bug (or at least confusion) in our API in my opinion. col is of type SchemaShape.Column. It has a member on it - public readonly ColumnType ItemType;. So col.ItemType isn't actually the VectorType.ItemType. It could be any ColumnType (for example, it could be an ImageType).

We then split out the "vector"-ness to a different public readonly VectorKind Kind; property.


In reply to: 248034844 [](ancestors = 248034844)

rawKind = vectorType.ItemType.RawKind;
else if (type is KeyType keyType)
rawKind = keyType.RawKind;
rawType = vectorType.ItemType.RawType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawType = vectorType.ItemType.RawType; [](start = 20, length = 38)

Placeholder note, once your other PR goes in you could probably benefit from your new GetItemType extension.

Copy link
Member Author

@eerhardt eerhardt Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that will simplify this code. #Resolved

@@ -843,7 +843,7 @@ internal static bool IsColumnTypeValid(ColumnType type)
if (!(type.ItemType is KeyType itemKeyType))
return false;
// Can only accept key types that can be converted to U4.
if (itemKeyType.Count == 0 && type.ItemType.RawKind > DataKind.U4)
if (itemKeyType.Count == 0 && !NgramUtils.IsValidNgramRawType(itemKeyType.RawType))
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsValidNgramRawType(itemKeyType.RawType [](start = 54, length = 39)

If I read the comment, it says, Can only accept key types that can be converted to U4, in every place that this IsValidNgramRawType is being called. That basically means U1, U2, and U4 are acceptable, and only U8 is not. So I'd have expected, based on this comment, that this would be a test against whether the raw type is a ulong or not.

However, if I then go and read this IsValidNgramRawType, it is doing something else! It is checking against all sorts of types that have nothing to do with key-types at all, including sbyte, short, int... which makes no sense to me, given the comment, the fact that these are definitely meant to operate over keytypes, etc.

I would have expected just making sure it isn't a ulong would have been sufficient, but... #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just converting the existing logic: type.ItemType.RawKind > DataKind.U4.

I can change that method to assume that KeyTypes can only be byte, ushort, uint, or ulong, and only make it check for ulong.


In reply to: 248037058 [](ancestors = 248037058)

Copy link
Contributor

@TomFinley TomFinley Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks @eerhardt. I'm not sure why the original code was written this was -- it's certainly a very long winded way of writing "is U8." :) There is a competition, or used to be a competition, to see who could write the most obfuscated code possible, as I recall. (Not part of this codebase, to be clear, just in a university somewhere if I recall correctly, or a newsgroup or something?) Maybe the author was a participant.


In reply to: 248110407 [](ancestors = 248110407,248037058)

@@ -47,7 +47,7 @@ public void SequencePredictorSchemaTest()
Assert.True(scoreColumn.Type is KeyType);
Assert.Equal((scoreColumnType as KeyType).Min, (scoreColumn.Type as KeyType).Min);
Assert.Equal((scoreColumnType as KeyType).Count, (scoreColumn.Type as KeyType).Count);
Assert.Equal((scoreColumnType as KeyType).RawKind, (scoreColumn.Type as KeyType).RawKind);
Assert.Equal((scoreColumnType as KeyType).RawType, (scoreColumn.Type as KeyType).RawType);
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scoreColumn.Type as KeyType [](start = 64, length = 27)

This was not your code, and given that it's in a test code we might consider it lower priority, but this was so offensive that I have to comment on it. And you're touching it. :D :D

This code has a lot of nonsense like this:

Assert.Equal((scoreColumnType as KeyType).Min, (scoreColumn.Type as KeyType).Min);

That makes no sense. The only conceivable reason to use an as style cast is if you're anticipating that it might not be this case. But then the code turns around and directly accesses a property without a ?., so that if it was in fact not that type we'd throw with a null reference exception. Which is really silly. I don't know if you want to fix all this code so that it is less offensive, but maybe at least the things you're touching directly could be written to be a bit more sane.

Not blocking, if you don't feel like doing it, just, the code in its current form makes my skin crawl. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


In reply to: 248038416 [](ancestors = 248038416)

else if (rawType == typeof(int))
dataType = TensorProto.Types.DataType.Int32;
else if (rawType == typeof(uint))
dataType = TensorProto.Types.DataType.Int64;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wschin - this one looks incorrect as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears both BL and U4 mapped these values when this repo was created -

public static ModelArgs GetModelArgs(ColumnType type, string colName,
List<long> dims = null, List<bool> dimsParams = null)
{
Contracts.CheckValue(type, nameof(type));
Contracts.CheckNonEmpty(colName, nameof(colName));
TensorProto.Types.DataType dataType = TensorProto.Types.DataType.Undefined;
DataKind rawKind;
if (type.IsVector)
rawKind = type.AsVector.ItemType.RawKind;
else if (type.IsKey)
rawKind = type.AsKey.RawKind;
else
rawKind = type.RawKind;
switch (rawKind)
{
case DataKind.BL:
dataType = TensorProto.Types.DataType.Float;
break;
case DataKind.TX:
dataType = TensorProto.Types.DataType.String;
break;
case DataKind.U4:
dataType = TensorProto.Types.DataType.Int64;
break;
case DataKind.R4:
dataType = TensorProto.Types.DataType.Float;
break;
default:
Contracts.Assert(false, "Unknown type.");
break;
}


In reply to: 248108591 [](ancestors = 248108591)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also /cc @zeahmed and @jignparm - in case they understand both of these type mappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we always translate uint to ONNX Int64, this should be fine.


In reply to: 248112875 [](ancestors = 248112875)

Remove all usages of RawKind that are outside of ML.Core and ML.Data assemblies. The next round will completely remove ColumnType.RawKind.

Part of the work necessary for dotnet#1860 and contributes to dotnet#1533.
Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eerhardt !! Thanks for getting all this done. Hopefully we (by "we" which I mean other people that actually did this ONNX stuff) can figure out the ONNX thing.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eerhardt eerhardt merged commit 861c726 into dotnet:master Jan 17, 2019
@eerhardt eerhardt deleted the ColumnTypeRawKind branch January 17, 2019 21:42
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants