Skip to content

Remove ColumnType.RawKind usages Round 1. #2143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 17, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 31 additions & 47 deletions src/Microsoft.ML.Onnx/OnnxUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -295,56 +295,40 @@ public static ModelArgs GetModelArgs(ColumnType type, string colName,
Contracts.CheckNonEmpty(colName, nameof(colName));

TensorProto.Types.DataType dataType = TensorProto.Types.DataType.Undefined;
DataKind rawKind;
Type rawType;
if (type is VectorType vectorType)
rawKind = vectorType.ItemType.RawKind;
else if (type is KeyType keyType)
rawKind = keyType.RawKind;
rawType = vectorType.ItemType.RawType;
else
rawType = type.RawType;
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawType [](start = 31, length = 7)

I am not sure operating over RawType is an appropriate alternative to RawKind, in this case. I might prefer that we operate over the exact ColumnType implementations, since that is more descriptive of what we're trying to do I think.

To give a concrete example, I see the code below is conflating uint as being identical, even thought the same .NET type is used for both raw uints and actual key types... same .NET type, very different information being encoded. This might be a sign that the entire function is screwed up (I see @sfilipi found what appears to be a bug below with booleans, so that's well within the realm of possibility), but I'm a little concerned that this is the first thing I see. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure operating over RawType is an appropriate alternative to RawKind, in this case

I don't think I agree. In this case we are mapping from ML.NET's ColumnType to Onnx's TensorProto.Types.DataType. It doesn't really matter if we are using a KeyType or NumberType.U4. In either case, we need to get a TensorProto.Types.DataType.UInt32. Actually - what would we return for KeyType? We wouldn't want to hard-code it to just TensorProto.Types.DataType.UInt32, would we? What if it was a KeyType that was using UInt16?


In reply to: 248033212 [](ancestors = 248033212)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I see your point, Fair enough thanks @eerhardt.


In reply to: 248109614 [](ancestors = 248109614,248033212)


if (rawType == typeof(bool))
dataType = TensorProto.Types.DataType.Float;
Copy link
Member

@sfilipi sfilipi Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float [](start = 54, length = 5)

Unrelated to your changes, I wander if this is correct. @wschin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing when I made the change...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is caused by the limited types accepted by old ONNX operators implemented in RS4. As ONNXRuntime supports much more ops than dark-age implementations, we should revise our type mappings and create more tests. This is a reasonable setting but it can be improved in another PR.


In reply to: 247951442 [](ancestors = 247951442)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have an existing bug to make this better - #1198

else if (rawType == typeof(ReadOnlyMemory<char>))
dataType = TensorProto.Types.DataType.String;
else if (rawType == typeof(sbyte))
dataType = TensorProto.Types.DataType.Int8;
else if (rawType == typeof(byte))
dataType = TensorProto.Types.DataType.Uint8;
else if (rawType == typeof(short))
dataType = TensorProto.Types.DataType.Int16;
else if (rawType == typeof(ushort))
dataType = TensorProto.Types.DataType.Uint16;
else if (rawType == typeof(int))
dataType = TensorProto.Types.DataType.Int32;
else if (rawType == typeof(uint))
dataType = TensorProto.Types.DataType.Int64;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wschin - this one looks incorrect as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears both BL and U4 mapped these values when this repo was created -

public static ModelArgs GetModelArgs(ColumnType type, string colName,
List<long> dims = null, List<bool> dimsParams = null)
{
Contracts.CheckValue(type, nameof(type));
Contracts.CheckNonEmpty(colName, nameof(colName));
TensorProto.Types.DataType dataType = TensorProto.Types.DataType.Undefined;
DataKind rawKind;
if (type.IsVector)
rawKind = type.AsVector.ItemType.RawKind;
else if (type.IsKey)
rawKind = type.AsKey.RawKind;
else
rawKind = type.RawKind;
switch (rawKind)
{
case DataKind.BL:
dataType = TensorProto.Types.DataType.Float;
break;
case DataKind.TX:
dataType = TensorProto.Types.DataType.String;
break;
case DataKind.U4:
dataType = TensorProto.Types.DataType.Int64;
break;
case DataKind.R4:
dataType = TensorProto.Types.DataType.Float;
break;
default:
Contracts.Assert(false, "Unknown type.");
break;
}


In reply to: 248108591 [](ancestors = 248108591)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also /cc @zeahmed and @jignparm - in case they understand both of these type mappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we always translate uint to ONNX Int64, this should be fine.


In reply to: 248112875 [](ancestors = 248112875)

else if (rawType == typeof(long))
dataType = TensorProto.Types.DataType.Int64;
else if (rawType == typeof(ulong))
dataType = TensorProto.Types.DataType.Uint64;
else if (rawType == typeof(float))
dataType = TensorProto.Types.DataType.Float;
else if (rawType == typeof(double))
dataType = TensorProto.Types.DataType.Double;
else
rawKind = type.RawKind;

switch (rawKind)
{
case DataKind.BL:
dataType = TensorProto.Types.DataType.Float;
break;
case DataKind.TX:
dataType = TensorProto.Types.DataType.String;
break;
case DataKind.I1:
dataType = TensorProto.Types.DataType.Int8;
break;
case DataKind.U1:
dataType = TensorProto.Types.DataType.Uint8;
break;
case DataKind.I2:
dataType = TensorProto.Types.DataType.Int16;
break;
case DataKind.U2:
dataType = TensorProto.Types.DataType.Uint16;
break;
case DataKind.I4:
dataType = TensorProto.Types.DataType.Int32;
break;
case DataKind.U4:
dataType = TensorProto.Types.DataType.Int64;
break;
case DataKind.I8:
dataType = TensorProto.Types.DataType.Int64;
break;
case DataKind.U8:
dataType = TensorProto.Types.DataType.Uint64;
break;
case DataKind.R4:
dataType = TensorProto.Types.DataType.Float;
break;
case DataKind.R8:
dataType = TensorProto.Types.DataType.Double;
break;
default:
string msg = "Unsupported type: DataKind " + rawKind.ToString();
Contracts.Check(false, msg);
break;
string msg = "Unsupported type: " + type.ToString();
Contracts.Check(false, msg);
}

string name = colName;
Expand Down
4 changes: 2 additions & 2 deletions src/Microsoft.ML.Recommender/MatrixFactorizationPredictor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ internal MatrixFactorizationPredictor(IHostEnvironment env, SafeTrainingAndModel
{
Contracts.CheckValue(env, nameof(env));
_host = env.Register(RegistrationName);
_host.Assert(matrixColumnIndexType.RawKind == DataKind.U4);
_host.Assert(matrixRowIndexType.RawKind == DataKind.U4);
_host.Assert(matrixColumnIndexType.RawType == typeof(uint));
_host.Assert(matrixRowIndexType.RawType == typeof(uint));
_host.CheckValue(buffer, nameof(buffer));
_host.CheckValue(matrixColumnIndexType, nameof(matrixColumnIndexType));
_host.CheckValue(matrixRowIndexType, nameof(matrixRowIndexType));
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Recommender/RecommenderUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ public static void CheckAndGetMatrixIndexColumns(RoleMappedData data, out Schema
private static bool TryMarshalGoodRowColumnType(ColumnType type, out KeyType keyType)
{
keyType = type as KeyType;
return keyType?.Count > 0 && type.RawKind == DataKind.U4;
return keyType?.Count > 0 && type.RawType == typeof(uint);
}

/// <summary>
Expand Down
8 changes: 4 additions & 4 deletions src/Microsoft.ML.StaticPipe/SchemaAssertionContext.cs
Original file line number Diff line number Diff line change
Expand Up @@ -68,13 +68,13 @@ public sealed class SchemaAssertionContext
/// <summary>Assertions over a column of <see cref="BoolType"/>.</summary>
public PrimitiveTypeAssertions<bool> Bool => default;

/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="DataKind.U1"/> <see cref="ColumnType.RawKind"/>.</summary>
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="byte"/> <see cref="ColumnType.RawType"/>.</summary>
public KeyTypeSelectorAssertions<byte> KeyU1 => default;
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="DataKind.U2"/> <see cref="ColumnType.RawKind"/>.</summary>
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="ushort"/> <see cref="ColumnType.RawType"/>.</summary>
public KeyTypeSelectorAssertions<ushort> KeyU2 => default;
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="DataKind.U4"/> <see cref="ColumnType.RawKind"/>.</summary>
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="uint"/> <see cref="ColumnType.RawType"/>.</summary>
public KeyTypeSelectorAssertions<uint> KeyU4 => default;
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="DataKind.U8"/> <see cref="ColumnType.RawKind"/>.</summary>
/// <summary>Assertions over a column of <see cref="KeyType"/> with <see cref="ulong"/> <see cref="ColumnType.RawType"/>.</summary>
public KeyTypeSelectorAssertions<ulong> KeyU8 => default;

internal static SchemaAssertionContext Inst = new SchemaAssertionContext();
Expand Down
44 changes: 16 additions & 28 deletions src/Microsoft.ML.StaticPipe/StaticSchemaShape.cs
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ private static Type GetTypeOrNull(SchemaShape.Column col)

if (col.IsKey)
{
Type physType = StaticKind(col.ItemType.RawKind);
Type physType = GetPhysicalType(col.ItemType);
Contracts.Assert(physType == typeof(byte) || physType == typeof(ushort)
|| physType == typeof(uint) || physType == typeof(ulong));
var keyType = typeof(Key<>).MakeGenericType(physType);
Expand All @@ -158,7 +158,7 @@ private static Type GetTypeOrNull(SchemaShape.Column col)

if (col.ItemType is PrimitiveType pt)
{
Type physType = StaticKind(pt.RawKind);
Type physType = GetPhysicalType(pt);
// Though I am unaware of any existing instances, it is theoretically possible for a
// primitive type to exist, have the same data kind as one of the existing types, and yet
// not be one of the built in types. (For example, an outside analogy to the key types.) For this
Expand Down Expand Up @@ -266,7 +266,7 @@ private static Type GetTypeOrNull(Schema.Column col)

if (t is KeyType kt)
{
Type physType = StaticKind(kt.RawKind);
Type physType = GetPhysicalType(kt);
Contracts.Assert(physType == typeof(byte) || physType == typeof(ushort)
|| physType == typeof(uint) || physType == typeof(ulong));
var keyType = kt.Count > 0 ? typeof(Key<>) : typeof(VarKey<>);
Expand Down Expand Up @@ -302,7 +302,7 @@ private static Type GetTypeOrNull(Schema.Column col)

if (t is PrimitiveType pt)
{
Type physType = StaticKind(pt.RawKind);
Type physType = GetPhysicalType(pt);
// Though I am unaware of any existing instances, it is theoretically possible for a
// primitive type to exist, have the same data kind as one of the existing types, and yet
// not be one of the built in types. (For example, an outside analogy to the key types.) For this
Expand All @@ -327,34 +327,22 @@ private static Type GetTypeOrNull(Schema.Column col)
/// type for communicating text.
/// </summary>
/// <returns>The basic type used to represent an item type in the static pipeline</returns>
private static Type StaticKind(DataKind kind)
private static Type GetPhysicalType(ColumnType columnType)
{
switch (kind)
switch (columnType)
{
// The default kind is reserved for unknown types.
case default(DataKind): return null;
case DataKind.I1: return typeof(sbyte);
case DataKind.I2: return typeof(short);
case DataKind.I4: return typeof(int);
case DataKind.I8: return typeof(long);

case DataKind.U1: return typeof(byte);
case DataKind.U2: return typeof(ushort);
case DataKind.U4: return typeof(uint);
case DataKind.U8: return typeof(ulong);
case DataKind.U16: return typeof(RowId);

case DataKind.R4: return typeof(float);
case DataKind.R8: return typeof(double);
case DataKind.BL: return typeof(bool);

case DataKind.Text: return typeof(string);
case DataKind.TimeSpan: return typeof(TimeSpan);
case DataKind.DateTime: return typeof(DateTime);
case DataKind.DateTimeZone: return typeof(DateTimeOffset);
case NumberType numberType:
case KeyType keyType:
case TimeSpanType timeSpanType:
case DateTimeType dateTimeType:
case DateTimeOffsetType dateTimeOffsetType:
case BoolType boolType:
return columnType.RawType;
case TextType textType:
return typeof(string);
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this style of change you have here is more what I'd be more comfortable with in the the OnnxUtils case. (Now, in that other case, it would also probably benefit from some when statements to further refine.) #Resolved


default:
throw Contracts.ExceptParam(nameof(kind), $"Unrecognized type '{kind}'");
return null;
Copy link
Member

@sfilipi sfilipi Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null; [](start = 20, length = 12)

curious, why change to return null, rather than throw? If an invalid type gets passed, would it be better to throw? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to preserve the first line's behavior:

case default(DataKind): return null;

When DataKind is default, it usually means it is a "custom" or "structured" type - like VectorType or ImageType. So I was preserving that behavior here - when you see an unrecognized ColumnType - return null.

I assume the existing throw condition was in place for handling invalid enum values - like 29 and -451.


In reply to: 247782795 [](ancestors = 247782795)

}
}
}
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Transforms/KeyToVectorMapping.cs
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
{
if (!inputSchema.TryFindColumn(colInfo.Input, out var col))
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);
if ((col.ItemType.GetItemType().RawKind == default) || !(col.ItemType is VectorType || col.ItemType is PrimitiveType))
if (!(col.ItemType is VectorType || col.ItemType is PrimitiveType))
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);

var metadata = new List<SchemaShape.Column>();
Expand Down
10 changes: 4 additions & 6 deletions src/Microsoft.ML.Transforms/MissingValueReplacing.cs
Original file line number Diff line number Diff line change
Expand Up @@ -901,16 +901,14 @@ public void SaveAsOnnx(OnnxContext ctx)

private bool SaveAsOnnxCore(OnnxContext ctx, int iinfo, ColInfo info, string srcVariableName, string dstVariableName)
{
DataKind rawKind;
Type rawType;
var type = _infos[iinfo].TypeSrc;
if (type is VectorType vectorType)
rawKind = vectorType.ItemType.RawKind;
else if (type is KeyType keyType)
rawKind = keyType.RawKind;
rawType = vectorType.ItemType.RawType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawType = vectorType.ItemType.RawType; [](start = 20, length = 38)

Placeholder note, once your other PR goes in you could probably benefit from your new GetItemType extension.

Copy link
Member Author

@eerhardt eerhardt Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that will simplify this code. #Resolved

else
rawKind = type.RawKind;
rawType = type.RawType;

if (rawKind != DataKind.R4)
if (rawType != typeof(float))
return false;

string opType = "Imputer";
Expand Down
54 changes: 12 additions & 42 deletions src/Microsoft.ML.Transforms/MissingValueReplacingUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,27 +21,17 @@ private static StatAggregator CreateStatAggregator(IChannel ch, ColumnType type,
// The type is a scalar.
if (kind == ReplacementKind.Mean)
{
switch (type.RawKind)
{
case DataKind.R4:
if (type.RawType == typeof(float))
return new R4.MeanAggregatorOne(ch, cursor, col);
case DataKind.R8:
else if (type.RawType == typeof(double))
return new R8.MeanAggregatorOne(ch, cursor, col);
default:
break;
}
}
if (kind == ReplacementKind.Min || kind == ReplacementKind.Max)
{
switch (type.RawKind)
{
case DataKind.R4:
if (type.RawType == typeof(float))
return new R4.MinMaxAggregatorOne(ch, cursor, col, kind == ReplacementKind.Max);
case DataKind.R8:
else if (type.RawType == typeof(double))
return new R8.MinMaxAggregatorOne(ch, cursor, col, kind == ReplacementKind.Max);
default:
break;
}
}
}
else if (bySlot)
Expand All @@ -53,55 +43,35 @@ private static StatAggregator CreateStatAggregator(IChannel ch, ColumnType type,

if (kind == ReplacementKind.Mean)
{
switch (vectorType.ItemType.RawKind)
{
case DataKind.R4:
if (vectorType.ItemType.RawType == typeof(float))
return new R4.MeanAggregatorBySlot(ch, vectorType, cursor, col);
case DataKind.R8:
else if (vectorType.ItemType.RawType == typeof(double))
return new R8.MeanAggregatorBySlot(ch, vectorType, cursor, col);
default:
break;
}
}
else if (kind == ReplacementKind.Min || kind == ReplacementKind.Max)
{
switch (vectorType.ItemType.RawKind)
{
case DataKind.R4:
if (vectorType.ItemType.RawType == typeof(float))
return new R4.MinMaxAggregatorBySlot(ch, vectorType, cursor, col, kind == ReplacementKind.Max);
case DataKind.R8:
else if (vectorType.ItemType.RawType == typeof(double))
return new R8.MinMaxAggregatorBySlot(ch, vectorType, cursor, col, kind == ReplacementKind.Max);
default:
break;
}
}
}
else
{
// Imputation across slots.
if (kind == ReplacementKind.Mean)
{
switch (vectorType.ItemType.RawKind)
{
case DataKind.R4:
if (vectorType.ItemType.RawType == typeof(float))
return new R4.MeanAggregatorAcrossSlots(ch, cursor, col);
case DataKind.R8:
else if (vectorType.ItemType.RawType == typeof(double))
return new R8.MeanAggregatorAcrossSlots(ch, cursor, col);
default:
break;
}
}
else if (kind == ReplacementKind.Min || kind == ReplacementKind.Max)
{
switch (vectorType.ItemType.RawKind)
{
case DataKind.R4:
if (vectorType.ItemType.RawType == typeof(float))
return new R4.MinMaxAggregatorAcrossSlots(ch, cursor, col, kind == ReplacementKind.Max);
case DataKind.R8:
else if (vectorType.ItemType.RawType == typeof(double))
return new R8.MinMaxAggregatorAcrossSlots(ch, cursor, col, kind == ReplacementKind.Max);
default:
break;
}
}
}
ch.Assert(false);
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Transforms/RandomFourierFeaturizing.cs
Original file line number Diff line number Diff line change
Expand Up @@ -680,7 +680,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
{
if (!inputSchema.TryFindColumn(colInfo.Input, out var col))
throw _host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);
if (col.ItemType.RawKind != DataKind.R4 || col.Kind != SchemaShape.Column.VectorKind.Vector)
if (col.ItemType.RawType != typeof(float) || col.Kind != SchemaShape.Column.VectorKind.Vector)
throw _host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);

result[colInfo.Output] = new SchemaShape.Column(colInfo.Output, SchemaShape.Column.VectorKind.Vector, NumberType.R4, false);
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Transforms/Text/LdaTransform.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1159,7 +1159,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
{
if (!inputSchema.TryFindColumn(colInfo.Input, out var col))
throw _host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input);
if (col.ItemType.RawKind != DataKind.R4 || col.Kind == SchemaShape.Column.VectorKind.Scalar)
if (col.ItemType.RawType != typeof(float) || col.Kind == SchemaShape.Column.VectorKind.Scalar)
throw _host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.Input, "a vector of floats", col.GetTypeString());

result[colInfo.Output] = new SchemaShape.Column(colInfo.Output, SchemaShape.Column.VectorKind.Vector, NumberType.R4, false);
Expand Down
4 changes: 2 additions & 2 deletions src/Microsoft.ML.Transforms/Text/NgramHashingTransformer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1177,7 +1177,7 @@ internal static bool IsColumnTypeValid(ColumnType type)
if (!(vectorType.ItemType is KeyType itemKeyType))
return false;
// Can only accept key types that can be converted to U4.
if (itemKeyType.Count == 0 && itemKeyType.RawKind > DataKind.U4)
if (itemKeyType.Count == 0 && !NgramUtils.IsValidNgramRawType(itemKeyType.RawType))
return false;
return true;
}
Expand All @@ -1189,7 +1189,7 @@ internal static bool IsSchemaColumnValid(SchemaShape.Column col)
if (!col.IsKey)
return false;
// Can only accept key types that can be converted to U4.
if (col.ItemType.RawKind > DataKind.U4)
if (!NgramUtils.IsValidNgramRawType(col.ItemType.RawType))
return false;
return true;
}
Expand Down
4 changes: 2 additions & 2 deletions src/Microsoft.ML.Transforms/Text/NgramTransform.cs
Original file line number Diff line number Diff line change
Expand Up @@ -843,7 +843,7 @@ internal static bool IsColumnTypeValid(ColumnType type)
if (!(vectorType.ItemType is KeyType itemKeyType))
return false;
// Can only accept key types that can be converted to U4.
if (itemKeyType.Count == 0 && itemKeyType.RawKind > DataKind.U4)
if (itemKeyType.Count == 0 && !NgramUtils.IsValidNgramRawType(itemKeyType.RawType))
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsValidNgramRawType(itemKeyType.RawType [](start = 54, length = 39)

If I read the comment, it says, Can only accept key types that can be converted to U4, in every place that this IsValidNgramRawType is being called. That basically means U1, U2, and U4 are acceptable, and only U8 is not. So I'd have expected, based on this comment, that this would be a test against whether the raw type is a ulong or not.

However, if I then go and read this IsValidNgramRawType, it is doing something else! It is checking against all sorts of types that have nothing to do with key-types at all, including sbyte, short, int... which makes no sense to me, given the comment, the fact that these are definitely meant to operate over keytypes, etc.

I would have expected just making sure it isn't a ulong would have been sufficient, but... #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just converting the existing logic: type.ItemType.RawKind > DataKind.U4.

I can change that method to assume that KeyTypes can only be byte, ushort, uint, or ulong, and only make it check for ulong.


In reply to: 248037058 [](ancestors = 248037058)

Copy link
Contributor

@TomFinley TomFinley Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks @eerhardt. I'm not sure why the original code was written this was -- it's certainly a very long winded way of writing "is U8." :) There is a competition, or used to be a competition, to see who could write the most obfuscated code possible, as I recall. (Not part of this codebase, to be clear, just in a university somewhere if I recall correctly, or a newsgroup or something?) Maybe the author was a participant.


In reply to: 248110407 [](ancestors = 248110407,248037058)

return false;
return true;
}
Expand All @@ -855,7 +855,7 @@ internal static bool IsSchemaColumnValid(SchemaShape.Column col)
if (!col.IsKey)
return false;
// Can only accept key types that can be converted to U4.
if (col.ItemType.RawKind > DataKind.U4)
if (!NgramUtils.IsValidNgramRawType(col.ItemType.RawType))
return false;
return true;
}
Expand Down
Loading