Skip to content

Documentation: Create ML.NET Component Guide #2943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glebuk opened this issue Mar 13, 2019 · 5 comments
Closed

Documentation: Create ML.NET Component Guide #2943

glebuk opened this issue Mar 13, 2019 · 5 comments
Labels
documentation Related to documentation of ML.NET P1 Priority of the issue for triage purpose: Needs to be fixed soon.

Comments

@glebuk
Copy link
Contributor

glebuk commented Mar 13, 2019

ML.NET Guide section of the documentation missing key summary information

Specifically we need

  1. Summary of components that guide users to components needed
  2. Guide on how to make pipelines performant
  3. Which items are exportable to ONNX
  4. Which trainers need caching/normalization?
  5. Which trainers and transformers to try first?

We can start with existing structure and augment missing information or add additional pages
Ideally, we can make a searchable list with checks where users can filter components by criteria. For example, I want to get me a "linear, regression" or "streaming, exportable to ONNX"

Trainers

Dimension
Trainer Name
Short Description
ML Tasks Supported with API doc links to each function
Common useful applications
Category/Algorithm
In which NuGet?
Supports export to ONNX?
Single or multi pass (need caching?)
Require data to be normalized?
Calibration needed
Types of input supported? (one or many columns? )
Kind of output produced
Scalability in terms of features and examples

Transformers

This information can be either presented either as a list, or graphically or as table, or as searchable table or something like this

Column
Transformer Name
Short Description
Common useful applications
Trainable or not? (estimator/transformer) - good to list both types and links to API
In which NuGet?
Supports export to ONNX?
Category/Algorithm
Single or multi pass (need caching?)
Types of input supported? (1-1, 1-many etc )
Types of input supported (only floats or other? one or many columns? )
What is the output looks like )
Scalability in terms of features and examples

Loaders

Column
Loader Name
Short Description
Data file type
Category/Algorithm
Common useful applications
Scalability in terms of speed and column count
@shmoradims shmoradims added the documentation Related to documentation of ML.NET label Mar 14, 2019
@tauheedul
Copy link
Contributor

tauheedul commented Apr 11, 2019

In the main ML.NET documentation or ML.NET Cookbook, it should also include a section for...

  • "How to use Python bindings with Nimbus ML and ML.NET"
  • "How to use Infer.NET with ML.NET"
    Few developers are aware Nimbus ML and Infer.NET exists because the main ML.NET documentation doesn't mention it. You have to go directly to the NimbusML and Infer.NET documentation to figure that out. But people won't know to do that if there is no mention that such capability is available in the main Repo.

https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md

@shmoradims shmoradims added the P1 Priority of the issue for triage purpose: Needs to be fixed soon. label May 21, 2019
@shmoradims
Copy link

As explained in #3218 the following properties are addressed in API reference 1.0 in IO columns, trainer characteristic, and estimator characteristics.

  • Machine learning task
  • Expected label type: bool, etc
  • Output columns: "Score", "PredictedLabel", etc with description of what each does
  • Is normalization required? Yes/No
  • Is caching required? Yes/No
  • Additional NuGet

ONNX conversion information was decided to be added later, on a single page about ONNX that lists all the components that are convertible to ONNX.

The following properties, where too complex or subjective to be added as part of 1.0.

  • Trainer Category (requires some taxonomy to be created)
  • When to use this trainer?
  • Supported number of features?
  • Supported number of examples?

@natke, did you and Gleb figure out any of the complex properties above?

@justinormont
Copy link
Contributor

For caching, it seems all the docs say: "Is caching required? | No' (example)

We should likely say, "is caching helpful"; as it's never technically required.

@natke
Copy link
Contributor

natke commented Jun 4, 2019

@shmoradims We added the information we had to https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-choose-an-ml-net-algorithm. We can talk about what else is required here. Are we getting a lot of user questions on this?

@shmoradims
Copy link

We don't have any particular feedback on this yet. Just doing regular backlog cleanup. The page you mentioned is good enough for closing this issue so I'm closing it.

@justinormont, could you please open an issue for the wording of 'is caching required?' above, with a proposal? There's more discussion that needs to happen, because technically 'is caching helpful?' can also always be true, hence not be particularly informative. Currently, trainers that require multiple passes through the data (like pairwise coupling) has this property as 'yes'. Maybe we need to change the question altogether.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET P1 Priority of the issue for triage purpose: Needs to be fixed soon.
Projects
None yet
Development

No branches or pull requests

5 participants