-
Notifications
You must be signed in to change notification settings - Fork 1.9k
KeyType Simplification #2146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyType Simplification #2146
Conversation
Need to update the documentation on the datatypes https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewTypeSystem.md#key-types. #Resolved |
docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs
Outdated
Show resolved
Hide resolved
|
||
Contiguous = contiguous; | ||
Min = min; | ||
Contracts.CheckParam(0 < count && count <= kind.ToMaxInt(), nameof(count), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<= [](start = 52, length = 2)
Shouldn't this inequality be strict? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should not be strict. What Count
expresses is the largest possible value for a key. The bottleneck is the key underlying type
. At most a key can take the largest representable value in that type, which is kind.ToMaxInt().
In reply to: 249202365 [](ancestors = 249202365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so, if I have a U1
, values 1 through 255 would represent the range of possible keys, and correspond to a count of 255, which equals I think kind.ToMaxInt
. So I think as written it is correct. Though I agree it is slightly confusing and you have to think about it a while before realizing it is correct -- something about it somehow just tickles my brain every time I see it, not quite sure what.
In reply to: 249229341 [](ancestors = 249229341,249202365)
Co-Authored-By: artidoro <[email protected]>
Co-Authored-By: artidoro <[email protected]>
Co-Authored-By: artidoro <[email protected]>
Co-Authored-By: artidoro <[email protected]>
Co-Authored-By: artidoro <[email protected]>
@@ -905,6 +906,7 @@ private Schema ComputeOutputSchema() | |||
public const string LoaderSignature = "TextLoader"; | |||
|
|||
private const uint VerForceVectorSupported = 0x0001000A; | |||
private const uint VersionNoMinCount = 0x00010004; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x00010004 [](start = 47, length = 10)
Why it's 10004 and not 1000B ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right!
Fixes #1540.
In this PR I remove the Min and Contiguous fields of KeyType. The reason for doing so is elaborated in the issue #1540.
Before this PR, a valid range of values for a key could be 1000 to 4000. This was represented by a key with Min 1000 and Count 4001. Having a key that starts at a value that is not 1 only very rarely happened in practice.
Now by default a valid range will start at 1, up to Count, with 0 being used for missing values.
As part of this PR I also removed the Min and Contiguous field of KeyRange (KeyRange serves the purpose of representing a valid range of values for a KeyType). It is useful to note that the max of a KeyRange will be equal to Count - 1 of the associated KeyType.
Also fixed the comments and the documentation on the type system to make sure they reflect the change.