Skip to content

SampleDatasetUtils is in disarray #2420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogancarr opened this issue Feb 5, 2019 · 9 comments
Closed

SampleDatasetUtils is in disarray #2420

rogancarr opened this issue Feb 5, 2019 · 9 comments
Assignees
Labels
API Issues pertaining the friendly API documentation Related to documentation of ML.NET

Comments

@rogancarr
Copy link
Contributor

There is a project called SamplesUtils sitting inside the core ML.NET codebase. I believe this was created before the [BestFriend] concept came around.

This project contains one namespace, SampleDatasetUtils that has one static class, DatasetUtils. Here are the following issues:

  1. Everything is public and nothing is sealed.
  2. This was built for Samples, but now Tests have taken a dependence on a couple of methods.
  3. The file itself has no organization, everything is everywhere.
@rogancarr
Copy link
Contributor Author

I suggest the following fixes:

  1. Mark everything as internal and all non-static classes as sealed.
  2. Move this library class into Samples and out of the core codebase.
  3. Mark everything appropriate as BestFriend and let Tests have access to them. I do think these will also be useful to Tests, and we do explicitly test some of our Sample code.

@sfilipi @TomFinley , what do you think about this?

@rogancarr rogancarr self-assigned this Feb 5, 2019
@rogancarr rogancarr added API Issues pertaining the friendly API documentation Related to documentation of ML.NET labels Feb 5, 2019
@sfilipi
Copy link
Member

sfilipi commented Feb 5, 2019

This class was created and is shipping inside the main Microsoft.ML package, so that the users can copy/paste the API references samples and they will just run.

#2277 has some similar questions.

We should definitely seal the classes, but leave public visibility to the methods that download the datasets, and other utilities, because they need to be referenced from outside our own assemblies.

We don't want to use the [BestFriend] annotation from the samples project.

We could organize the SamplesUtils file, as it grows.

@rogancarr
Copy link
Contributor Author

Thanks Senja!

I'll start with the move to sealing, but I do share @TomFinley 's concerns from #2277. If we ship this in the main NuGet with public APIs, then people, including us as evidenced by our Tests usage, will begin taking a dependence on these methods and classes, and then we'll have to support them forever.

@sfilipi
Copy link
Member

sfilipi commented Feb 5, 2019

The alternative would be to have a separate package for the datasets. We have discussed that here: #2393 1137 and the decision was to not do it. Reconsider?

Scikit has a package for dataset, so does R.

@TomFinley
Copy link
Contributor

TomFinley commented Feb 5, 2019

As already expressed in #2277 and, as @rogancarr has already noted, I feel fairly strongly that this class and the assembly containing it should be part of a separate nuget. It is clearly not core functionality of ML.NET, and it's perfectly fine for our samples to rely on multiple nugets (as, of course, they already do). Yet it is in Microsoft.ML nuget. So weird.

[Did you mean to link #1137, instead of #2393? (The latter appears to be irrelevant.) In #1137, it was established (correctly) that we should not bundle the datasets in the nuget but download them in some other way. It said nothing about the merits of making that samples class part of the main nuget.

@sfilipi
Copy link
Member

sfilipi commented Feb 5, 2019

Seems like we're in agreement, to summarize this issue can be closed after making the SamplesUtils a separate package.

@eerhardt
Copy link
Member

eerhardt commented Mar 2, 2019

We are not shipping SamplesUtils as a stable package. The SamplesUtils assembly is now in a separate package - Microsoft.ML.SampleUtils - as of #2543.

Moving this out of Project 13, since it is not a stable package.

@sfilipi
Copy link
Member

sfilipi commented Apr 4, 2019

After further discussion, we'll try to retire SamplesUtils, and have all utilities in the samples (IDataView prints, and dataset generation.)

@luisquintanilla @JRAlexander @natke for FYI or if you want to comment about.

@shmoradims
Copy link

We discontinued using SamplesUtils for samples. Closing this issue. Create #3584 for deleting it altogether.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API documentation Related to documentation of ML.NET
Projects
None yet
Development

No branches or pull requests

7 participants