Skip to content

KeyType Simplification #2146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jan 25, 2019
Merged

KeyType Simplification #2146

merged 28 commits into from
Jan 25, 2019

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Jan 15, 2019

Fixes #1540.

In this PR I remove the Min and Contiguous fields of KeyType. The reason for doing so is elaborated in the issue #1540.

Before this PR, a valid range of values for a key could be 1000 to 4000. This was represented by a key with Min 1000 and Count 4001. Having a key that starts at a value that is not 1 only very rarely happened in practice.

Now by default a valid range will start at 1, up to Count, with 0 being used for missing values.

As part of this PR I also removed the Min and Contiguous field of KeyRange (KeyRange serves the purpose of representing a valid range of values for a KeyType). It is useful to note that the max of a KeyRange will be equal to Count - 1 of the associated KeyType.

Also fixed the comments and the documentation on the type system to make sure they reflect the change.

@artidoro artidoro added the API Issues pertaining the friendly API label Jan 15, 2019
@artidoro artidoro self-assigned this Jan 15, 2019
@artidoro
Copy link
Contributor Author

artidoro commented Jan 15, 2019

Need to update the documentation on the datatypes https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewTypeSystem.md#key-types. #Resolved

@artidoro artidoro changed the title WIP: KeyType Simplification KeyType Simplification Jan 15, 2019

Contiguous = contiguous;
Min = min;
Contracts.CheckParam(0 < count && count <= kind.ToMaxInt(), nameof(count),
Copy link

@yaeldekel yaeldekel Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<= [](start = 52, length = 2)

Shouldn't this inequality be strict? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should not be strict. What Count expresses is the largest possible value for a key. The bottleneck is the key underlying type. At most a key can take the largest representable value in that type, which is kind.ToMaxInt().


In reply to: 249202365 [](ancestors = 249202365)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so, if I have a U1, values 1 through 255 would represent the range of possible keys, and correspond to a count of 255, which equals I think kind.ToMaxInt. So I think as written it is correct. Though I agree it is slightly confusing and you have to think about it a while before realizing it is correct -- something about it somehow just tickles my brain every time I see it, not quite sure what.


In reply to: 249229341 [](ancestors = 249229341,249202365)

@artidoro artidoro merged commit 9b4e069 into dotnet:master Jan 25, 2019
@@ -905,6 +906,7 @@ private Schema ComputeOutputSchema()
public const string LoaderSignature = "TextLoader";

private const uint VerForceVectorSupported = 0x0001000A;
private const uint VersionNoMinCount = 0x00010004;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x00010004 [](start = 47, length = 10)

Why it's 10004 and not 1000B ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!

@artidoro artidoro deleted the keytype branch March 13, 2019 17:56
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IDataView Cleanup: KeyType Simplification
7 participants