-
Notifications
You must be signed in to change notification settings - Fork 1.9k
SampleDatasetUtils is in disarray #2420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suggest the following fixes:
@sfilipi @TomFinley , what do you think about this? |
This class was created and is shipping inside the main Microsoft.ML package, so that the users can copy/paste the API references samples and they will just run. #2277 has some similar questions. We should definitely seal the classes, but leave public visibility to the methods that download the datasets, and other utilities, because they need to be referenced from outside our own assemblies. We don't want to use the [BestFriend] annotation from the samples project. We could organize the SamplesUtils file, as it grows. |
Thanks Senja! I'll start with the move to sealing, but I do share @TomFinley 's concerns from #2277. If we ship this in the main NuGet with public APIs, then people, including us as evidenced by our Tests usage, will begin taking a dependence on these methods and classes, and then we'll have to support them forever. |
The alternative would be to have a separate package for the datasets. We have discussed that here: #2393 1137 and the decision was to not do it. Reconsider? Scikit has a package for dataset, so does R. |
As already expressed in #2277 and, as @rogancarr has already noted, I feel fairly strongly that this class and the assembly containing it should be part of a separate nuget. It is clearly not core functionality of ML.NET, and it's perfectly fine for our samples to rely on multiple nugets (as, of course, they already do). Yet it is in [Did you mean to link #1137, instead of #2393? (The latter appears to be irrelevant.) In #1137, it was established (correctly) that we should not bundle the datasets in the nuget but download them in some other way. It said nothing about the merits of making that samples class part of the main nuget. |
Seems like we're in agreement, to summarize this issue can be closed after making the SamplesUtils a separate package. |
We are not shipping Moving this out of Project 13, since it is not a stable package. |
After further discussion, we'll try to retire SamplesUtils, and have all utilities in the samples (IDataView prints, and dataset generation.) @luisquintanilla @JRAlexander @natke for FYI or if you want to comment about. |
We discontinued using SamplesUtils for samples. Closing this issue. Create #3584 for deleting it altogether. |
There is a project called
SamplesUtils
sitting inside the core ML.NET codebase. I believe this was created before the[BestFriend]
concept came around.This project contains one
namespace
,SampleDatasetUtils
that has onestatic
class
,DatasetUtils
. Here are the following issues:The text was updated successfully, but these errors were encountered: