Skip to content

Replace ColumnInfo usage with Schema.Column, remove ColumnInfo #1924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 20, 2018

Conversation

TomFinley
Copy link
Contributor

Fixes #1923. As usual commits are structured in such a way as to make the code approachable. Since the first usage was against RoleMappedSchema and that was used in many, many places, I'm sorry to say that is probably the biggest change.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Dec 19, 2018
@TomFinley TomFinley added this to the 1218 milestone Dec 19, 2018
@TomFinley TomFinley self-assigned this Dec 19, 2018
var labelNames = default(VBuffer<ReadOnlyMemory<char>>);
if (schema.Label.Type.IsKey &&
(type = schema.Schema[schema.Label.Index].Metadata.Schema.GetColumnOrNull(MetadataUtils.Kinds.KeyValues)?.Type) != null &&
type.ItemType.IsKnownSizeVector && type.ItemType.IsText)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While most of the changes in the PR are intended to be over identical functionality, this was an exception since it was clearly a bug... type.ItemType.IsKnownSizeVector must always be false, since an item type could never be a vector type of any sort, known size or not!

@@ -239,23 +239,23 @@ private IDataView WrapPerInstance(RoleMappedData perInst)
colsToKeep.Add(MetricKinds.ColumnNames.FoldIndex);

// Maml always outputs a name column, if it doesn't exist add a GenerateNumberTransform.
if (perInst.Schema.Name == null)
if (perInst.Schema.Name?.Name is string nameName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of a fun application of pattern matching. If that evaluated to null, then the pattern would not match. Consider this small program:

private static void TestString(string s)
{
    if (s is string ss)
        Console.WriteLine($"Yup {ss}");
    else
        Console.WriteLine("Nope");
}

private static void Main(string[] args)
{
    TestString(null);
    TestString("hi");
}

The output is:

Nope
Yup hi

Very good. Yet, let's imagine I change to this:

    if (s is var ss)

Now, that ss is still of string type, but now the outcome becomes this:

Yup
Yup hi

I don't fully appreciate in what way this var pattern matching matches the type, yet evinces such very different behavior! No doubt one of the donut people knows more about this.

@Zruty0
Copy link
Contributor

Zruty0 commented Dec 19, 2018

    private protected override void CheckScoreAndLabelTypes(RoleMappedSchema schema)

the comment for ExceptSchemaMismatch applies for all of these overrides. I can see this is not the reason for the PR, so I won't insist on you doing it now, but, I assume, you touch it you fix it :)


Refers to: src/Microsoft.ML.Data/Evaluators/RegressionEvaluator.cs:55 in d2f1818. [](commit_id = d2f1818, deletion_comment = False)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

if (t != NumberType.Float && t.KeyCount != 2)
throw Host.Except("Label column '{0}' has type '{1}' but must be R4 or a 2-value key", schema.Label.Name, t).MarkSensitive(MessageSensitivity.Schema);
throw Host.Except("Label column '{0}' has type '{1}' but must be R4 or a 2-value key", schema.Label.Value.Name, t).MarkSensitive(MessageSensitivity.Schema);
Copy link
Member

@sfilipi sfilipi Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R4 [](start = 81, length = 3)

sidenote, does it make sense to start using 'float' etc. in the user errors, instead of the internal types? #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I'll log an issue and we can discuss there, rather than comments on this PR.


In reply to: 243381184 [](ancestors = 243381184)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may. In fact I think we've discussed this as a desirable outcome, but if we do it we should do it everywhere as part of a deliberate policy. So your plan to open an issue seems good to me, @sfilipi.

continue;
if (schema.TryGetMetadata(TextType.Instance, MetadataUtils.Kinds.ScoreValueKind, col, ref tmp) &&
ReadOnlyMemoryUtils.EqualsStr(valueKind, tmp))
if (col.Metadata.Schema.GetColumnOrNull(MetadataUtils.Kinds.ScoreValueKind)?.Type == TextType.Instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)?.Type == TextType.Instance [](start = 90, length = 28)

Just checking: is this check here necessary?

Copy link
Contributor Author

@TomFinley TomFinley Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absolutely is. I think you're suggesting that we don't really need to check for the type, because we "know" that the metadata is text type, is that so?

IDataView is an open system, and metadata is optional -- but the flipside of being optional and open is that we must accomodate ourselves (without error) that the type of metadata might not quite be what we expect. So, even though within this codebase the only thing producing this kind of metadata will have the "right" kind, nothing prevents the metadata from having a type we do not expect. And that is not an error condition -- that's just part of our contract with metadata.

I've written about this in the past, particularly relevant is the first paragraph of this section of my IDataView implementation document, titled "Metadata."

#pragma warning disable MSML_ContractsNameUsesNameof
if (!ColumnInfo.TryCreateFromName(schema, name, out info))
var col = schema.GetColumnOrNull(name);
if (!col.HasValue)
Copy link
Member

@sfilipi sfilipi Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col [](start = 21, length = 3)

null check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure what's is being asked here @sfilipi , could you clarify? schema is checked for null, the name has been checked for null, and as you see below the nullable struct is being checked for a value and throwing an appropriate exception. As far as I can tell no other entities are involved in the line you're commenting on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean col can be null, since it comes from GetColumnOrNull.

Shall it check for null before accessing members on it:

Next line: col?.HasValue


In reply to: 243391027 [](ancestors = 243391027)

if (t != NumberType.R4)
throw Host.Except("Label column '{0}' has type '{1}' but must be R4", schema.Label.Name, t);
throw Host.ExceptSchemaMismatch(nameof(schema), "score", schema.Label.Value.Name, "R4", t.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

score [](start = 65, length = 5)

label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good one thanks @sfilipi !

@@ -70,27 +70,6 @@ internal interface IColumnFunction : ICanSaveModel
NormalizingTransformer.NormalizerModelParametersBase GetNormalizerModelParams();
}

internal static class NormalizeUtils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal static class NormalizeUtils [](start = 4, length = 36)

double-checking that removing this is intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. It is an artifact of a time back in summer, back when we were first formulating ideas of what the public API would look like, and in fact the ideas around this were fairly ill formed. (This particular code was introduced in #433.) Clearly our thinking has moved on since then. This particular method (which as you see used to operate over ColumnInfo) is now directly taken care of by the bool IsNormalized(this Schema.Column column) extension method. And, because RoleMappedSchema now directly returns Schema.Column instead of ColumnInfo (which is the point of this PR, after all), there was no longer any reason to have this method. But thanks for checking @sfilipi.


In reply to: 243412114 [](ancestors = 243412114)

ch.Assert(examplesToFeedTrain.Schema.Label.HasValue);
ch.Assert(examplesToFeedTrain.Schema.Feature.HasValue);
if (examples.Schema.Weight.HasValue)
ch.Assert(examplesToFeedTrain.Schema.Weight.HasValue);
Copy link
Member

@sfilipi sfilipi Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to check on the Value :

ch.CheckValue(examplesToFeedTrain.Schema.Weight.Value);

There is more of those in the other files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely did not mean to check on the value. First of all, I cannot, since Nullable<T> is a struct and Schema.Column is a struct, we cannot call AssertValue on it. Also, even if hypothetically I could check on Value, if HasValue were false then accessing that Value property would itself throw an exception, which is not what I want (see documentation of `Nullable here).

If you're asking if these should be Check and not Assert, it may not be obvious but examplesToFeedTrain is derived from examples (see the forematter of this, all that stuff about shuffling and whatnot), and the roles of examples were already checked or otherwise confirmed to be present (see, for example, lines 102 for label, 103 for float, and line 125 in this block for weights). So asserts are the appropriate choice here, since the intention of this code is to communicate to the reader what we expect to be true, versus input validation, which would be more properly handled via checks or conditions (as in fact it is).

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit dea6c02 into dotnet:master Dec 20, 2018
@TomFinley TomFinley deleted the tfinley/ColumnInfo branch March 6, 2019 16:31
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants