-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Replace ColumnInfo usage with Schema.Column, remove ColumnInfo #1924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var labelNames = default(VBuffer<ReadOnlyMemory<char>>); | ||
if (schema.Label.Type.IsKey && | ||
(type = schema.Schema[schema.Label.Index].Metadata.Schema.GetColumnOrNull(MetadataUtils.Kinds.KeyValues)?.Type) != null && | ||
type.ItemType.IsKnownSizeVector && type.ItemType.IsText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While most of the changes in the PR are intended to be over identical functionality, this was an exception since it was clearly a bug... type.ItemType.IsKnownSizeVector
must always be false, since an item type could never be a vector type of any sort, known size or not!
@@ -239,23 +239,23 @@ private IDataView WrapPerInstance(RoleMappedData perInst) | |||
colsToKeep.Add(MetricKinds.ColumnNames.FoldIndex); | |||
|
|||
// Maml always outputs a name column, if it doesn't exist add a GenerateNumberTransform. | |||
if (perInst.Schema.Name == null) | |||
if (perInst.Schema.Name?.Name is string nameName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of a fun application of pattern matching. If that evaluated to null
, then the pattern would not match. Consider this small program:
private static void TestString(string s)
{
if (s is string ss)
Console.WriteLine($"Yup {ss}");
else
Console.WriteLine("Nope");
}
private static void Main(string[] args)
{
TestString(null);
TestString("hi");
}
The output is:
Nope
Yup hi
Very good. Yet, let's imagine I change to this:
if (s is var ss)
Now, that ss
is still of string
type, but now the outcome becomes this:
Yup
Yup hi
I don't fully appreciate in what way this var
pattern matching matches the type, yet evinces such very different behavior! No doubt one of the donut people knows more about this.
d2f1818
to
7af3c16
Compare
src/Microsoft.ML.Data/Evaluators/QuantileRegressionEvaluator.cs
Outdated
Show resolved
Hide resolved
the comment for Refers to: src/Microsoft.ML.Data/Evaluators/RegressionEvaluator.cs:55 in d2f1818. [](commit_id = d2f1818, deletion_comment = False) |
src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs
Show resolved
Hide resolved
src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7af3c16
to
ac287d8
Compare
if (t != NumberType.Float && t.KeyCount != 2) | ||
throw Host.Except("Label column '{0}' has type '{1}' but must be R4 or a 2-value key", schema.Label.Name, t).MarkSensitive(MessageSensitivity.Schema); | ||
throw Host.Except("Label column '{0}' has type '{1}' but must be R4 or a 2-value key", schema.Label.Value.Name, t).MarkSensitive(MessageSensitivity.Schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R4 [](start = 81, length = 3)
sidenote, does it make sense to start using 'float' etc. in the user errors, instead of the internal types? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I'll log an issue and we can discuss there, rather than comments on this PR.
In reply to: 243381184 [](ancestors = 243381184)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may. In fact I think we've discussed this as a desirable outcome, but if we do it we should do it everywhere as part of a deliberate policy. So your plan to open an issue seems good to me, @sfilipi.
continue; | ||
if (schema.TryGetMetadata(TextType.Instance, MetadataUtils.Kinds.ScoreValueKind, col, ref tmp) && | ||
ReadOnlyMemoryUtils.EqualsStr(valueKind, tmp)) | ||
if (col.Metadata.Schema.GetColumnOrNull(MetadataUtils.Kinds.ScoreValueKind)?.Type == TextType.Instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
)?.Type == TextType.Instance [](start = 90, length = 28)
Just checking: is this check here necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It absolutely is. I think you're suggesting that we don't really need to check for the type, because we "know" that the metadata is text type, is that so?
IDataView
is an open system, and metadata is optional -- but the flipside of being optional and open is that we must accomodate ourselves (without error) that the type of metadata might not quite be what we expect. So, even though within this codebase the only thing producing this kind of metadata will have the "right" kind, nothing prevents the metadata from having a type we do not expect. And that is not an error condition -- that's just part of our contract with metadata.
I've written about this in the past, particularly relevant is the first paragraph of this section of my IDataView
implementation document, titled "Metadata."
#pragma warning disable MSML_ContractsNameUsesNameof | ||
if (!ColumnInfo.TryCreateFromName(schema, name, out info)) | ||
var col = schema.GetColumnOrNull(name); | ||
if (!col.HasValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col [](start = 21, length = 3)
null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure what's is being asked here @sfilipi , could you clarify? schema
is checked for null, the name
has been checked for null, and as you see below the nullable struct is being checked for a value and throwing an appropriate exception. As far as I can tell no other entities are involved in the line you're commenting on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean col can be null, since it comes from GetColumnOrNull.
Shall it check for null before accessing members on it:
Next line: col?.HasValue
In reply to: 243391027 [](ancestors = 243391027)
if (t != NumberType.R4) | ||
throw Host.Except("Label column '{0}' has type '{1}' but must be R4", schema.Label.Name, t); | ||
throw Host.ExceptSchemaMismatch(nameof(schema), "score", schema.Label.Value.Name, "R4", t.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score [](start = 65, length = 5)
label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good one thanks @sfilipi !
ac287d8
to
e0b2463
Compare
@@ -70,27 +70,6 @@ internal interface IColumnFunction : ICanSaveModel | |||
NormalizingTransformer.NormalizerModelParametersBase GetNormalizerModelParams(); | |||
} | |||
|
|||
internal static class NormalizeUtils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal static class NormalizeUtils [](start = 4, length = 36)
double-checking that removing this is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. It is an artifact of a time back in summer, back when we were first formulating ideas of what the public API would look like, and in fact the ideas around this were fairly ill formed. (This particular code was introduced in #433.) Clearly our thinking has moved on since then. This particular method (which as you see used to operate over ColumnInfo
) is now directly taken care of by the bool IsNormalized(this Schema.Column column)
extension method. And, because RoleMappedSchema
now directly returns Schema.Column
instead of ColumnInfo
(which is the point of this PR, after all), there was no longer any reason to have this method. But thanks for checking @sfilipi.
In reply to: 243412114 [](ancestors = 243412114)
ch.Assert(examplesToFeedTrain.Schema.Label.HasValue); | ||
ch.Assert(examplesToFeedTrain.Schema.Feature.HasValue); | ||
if (examples.Schema.Weight.HasValue) | ||
ch.Assert(examplesToFeedTrain.Schema.Weight.HasValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to check on the Value :
ch.CheckValue(examplesToFeedTrain.Schema.Weight.Value);
There is more of those in the other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely did not mean to check on the value. First of all, I cannot, since Nullable<T>
is a struct and Schema.Column
is a struct, we cannot call AssertValue
on it. Also, even if hypothetically I could check on Value
, if HasValue
were false then accessing that Value
property would itself throw an exception, which is not what I want (see documentation of `Nullable here).
If you're asking if these should be Check
and not Assert
, it may not be obvious but examplesToFeedTrain
is derived from examples
(see the forematter of this, all that stuff about shuffling and whatnot), and the roles of examples
were already checked or otherwise confirmed to be present (see, for example, lines 102 for label, 103 for float, and line 125 in this block for weights). So asserts are the appropriate choice here, since the intention of this code is to communicate to the reader what we expect to be true, versus input validation, which would be more properly handled via checks or conditions (as in fact it is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #1923. As usual commits are structured in such a way as to make the code approachable. Since the first usage was against
RoleMappedSchema
and that was used in many, many places, I'm sorry to say that is probably the biggest change.