Skip to content

Add ML.NET Roadmap #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions Microsoft.ML.sln
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,35 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.Parquet", "src
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.Sweeper", "src\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj", "{55C8122D-79EA-48AB-85D0-EB551FC1C427}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "docs", "docs", "{E20AF96D-3F66-4065-8A89-BEE479D74536}"
ProjectSection(SolutionItems) = preProject
Documentation\README.md = Documentation\README.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "project-docs", "project-docs", "{52794B40-AB8A-41AF-9EF7-799C80D6E0BC}"
ProjectSection(SolutionItems) = preProject
Documentation\project-docs\contributing.md = Documentation\project-docs\contributing.md
Documentation\project-docs\developer-guide.md = Documentation\project-docs\developer-guide.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{76F579E4-B9D2-4A0C-A511-EEFA4B2B829F}"
ProjectSection(SolutionItems) = preProject
CONTRIBUTING.md = CONTRIBUTING.md
README.md = README.md
ROADMAP.md = ROADMAP.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "building", "building", "{DB751004-5D49-4B88-B78F-29CA9887087D}"
ProjectSection(SolutionItems) = preProject
Documentation\building\unix-instructions.md = Documentation\building\unix-instructions.md
Documentation\building\windows-instructions.md = Documentation\building\windows-instructions.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "specs", "specs", "{2DEFC784-F2B5-44EA-ABBB-0DCF3E689DAC}"
ProjectSection(SolutionItems) = preProject
Documentation\specs\mvp.md = Documentation\specs\mvp.md
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -178,6 +207,9 @@ Global
{B7B593C5-FB8C-4ADA-A638-5B53B47D087E} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
{16BB1454-2108-40E5-B3A6-594654005303} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
{55C8122D-79EA-48AB-85D0-EB551FC1C427} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
{52794B40-AB8A-41AF-9EF7-799C80D6E0BC} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
{DB751004-5D49-4B88-B78F-29CA9887087D} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
{2DEFC784-F2B5-44EA-ABBB-0DCF3E689DAC} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {41165AF1-35BB-4832-A189-73060F82B01D}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ For more information, see the [.NET Foundation Code of Conduct](https://dotnetfo

## License

ML.NET is licensed under the [MIT license](LICENSE.TXT).
ML.NET is licensed under the [MIT license](LICENSE).

## .NET Foundation

Expand Down
95 changes: 95 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# The ML.NET Roadmap

The goal of ML.NET project is to provide an easy to use, .NET-friendly ML platform. This document describes the tentative plan for the project in the short and long-term.

ML.NET is a community effort and we welcome community feedback on our plans. The best way to give feedback is to open an issue in this repo. It's always a good idea to have a discussion before embarking on a large code change to make sure there is not duplicated effort.
Many of the features listed on the roadmap already exist in the internal version of the code-base. They are marked with (*). We plan to release more and more internal features to Github over time.

In the meanwhile, we are looking for contributions. An easy place to start is to look at _up-for-grabs_ issues on [Github](https://github.com/dotnet/machinelearning/issues?q=is%3Aopen+is%3Aissue+label%3Aup-for-grabs)

## Short Term
### Training Improvements
* Improved public API for training and inference
* Enhanced tests and scenarios
* Additional Learners
* [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) for anomaly detection (*)
* [LightGBM](https://github.com/Microsoft/LightGBM) - a high-performance boosted decision tree (*)
* Additional Learning Tasks (*)
* _Ranking_ - problem where the goal is to automatically sort (rank) instances within a group based on ranked examples in training data
* _Anomaly Detection_ - is also known as _outlier detection_. It is a task to identify items, events or observations which do not conform to an expected pattern in the dataset.
* _Quantile Regression_ is a type of regression analysis. Whereas regression results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable
* Additional Data source support (*)
* Apache Parquet
* Native Binary high-performance format

### Featurization Improvements
* Text (*)
* Natural language text preprocessing such as tokenization, part-of-speech tagging, and sentence breaking
* Pre-trained text models that can be used for extracting of semantic or sentiment features from text
* Image (*)
* Image preprocessing such as loading, resizing, and normalization if images
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type-o: if => of

* Image featurization, including industry-standard pre-trained ImageNet neural models, such as ResNet and AlexNet

### Trained Model Management
* Export models to [ONNX](https://github.com/onnx/models) (*)

### GUI
* Release the Model Builder tool to ease model development (*)
* Design improvements to make the design adhere better to Fluent principles
* Add a view for an easier comparison of several experiments
* Ability to select the best performing pipeline, by sweeping transforms, the same way learners are swept.

## Longer Term

### Training Improvements
* Add more learners, perhaps, including: (*)
* Generative Additive Models
* [SymSGD](https://arxiv.org/pdf/1705.08030.pdf) -a fast linear SGD learner
* Factorization Machines
* [ProtoNN and Bonsaii](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/) for compact and effecient models
* Integration with other ML packages
* Accord.NET
* etc.
* Deep Learning Support
* Integrate with leading DNN package(s)
* Support for transfer learning
* Hybrid training of pipelines containing both DNN and non-DNN predictors
* Additional ML tasks (*)
* _Recommendation_ - Is a problem that can be phrased a: "For a given user, predict the ratings this user would give to the items that they have not explicitly rated yet"
* _Anomaly Detection_, also known as _outlier detection_. It is a task to identify items, events or observations which do not conform to an expected pattern in the dataset. Typical examples are: detecting credit card fraud, medical problems or errors in text. Anomalies are also referred to as outliers, �novelties, noise, deviations and exceptions
* _Sequence Classification_ - learns from a series of examples in a sequence, and each item is assigned a distinct label, akin to a multiclass classification task
* Additional Data source support
* Data from SQL Databases, such as SQL Server
* Data located on the cloud
* Distributed Training
* Easily train models on the cloud
* Whole-pipeline optimizations for both training and inference
* Automation of more data science tasks
* Additional Trainers
* Additional tasks

### Featurization Improvements
* Improved data wrangling support
* Add auto-suggestion of training pipelines. The technology will provide intelligent ```LearningPipeline``` suggestions based on training data attributes (*)
* Additional natural language text preprocessing
* Time series and forecasting
* Support for Video, audio, and other data types

### Trained Model Management
* Model operationalization in the Cloud
* Model deployment on mobile platforms
* Ability to run [ONNX](https://github.com/onnx/models) models in the ```LearningPipeline```
* Support for the next version of ONNX
* Model deployment to IOT devices

### GUI Improvements
* Usability improvements
* Support of additional ML.NET features
* Improved code generation for training and inference
* Run the pipelines rather than just suggesting them; present to the user the pipelines and the metrics generated from running.
* Distributed runs, rather than sequential.

### Other
* Support for additional languages
* Published reproducible benchmarks against industry-leading ML toolkits on a variety of tasks and datasets