Skip to content

How to create NN model for NEAT in ML dot net ?? #4605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sps014 opened this issue Dec 28, 2019 · 17 comments
Closed

How to create NN model for NEAT in ML dot net ?? #4605

sps014 opened this issue Dec 28, 2019 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@sps014
Copy link

sps014 commented Dec 28, 2019

Happy Christmas Guys...

I want to create a Neural Net model in ML.NET and also want to manipulate weights and biases for NEAT(Neuro Evolution of Augmented Topologies). Is this possible to dot this in native c# only?
Is there any way to create Neural Nets if not when we will get the ability to create NN in c# or there any plan to provide any API for NN in the future? I am really sick of Tensorflow.

Enjoy your holiday guys...

@codemzs
Copy link
Member

codemzs commented Dec 28, 2019

Why are you sick of TensorFlow?

@sps014
Copy link
Author

sps014 commented Dec 28, 2019

Why are you sick of TensorFlow?

on dot net Tensorflow, there is no official binding, SciSharp does a great job but the performance issues are quite relevant.

@sps014
Copy link
Author

sps014 commented Dec 28, 2019

Why are you sick of TensorFlow?

on dot net Tensorflow, there is no official binding, SciSharp does a great job but the performance issues are quite relevant.

The main point is that will there be NN support on ML Dot Net

@sps014
Copy link
Author

sps014 commented Dec 30, 2019

Anyone from the team can please comment on this so we can know if it is on the roadmap or not.

@najeeb-kazmi najeeb-kazmi added the enhancement New feature or request label Dec 30, 2019
@najeeb-kazmi
Copy link
Member

Hi @sps014 - thanks for reaching out to us. Right now, ML.NET only supports scoring an already trained TensorFLow model (examples). ML.NET used to support retraining a pre-defined DNN model, but this was hidden in #4362 due to the feature not being fully tested. See #4520 for more context.

@gvashishtha could you comment on the roadmap for re-enabling this feature?

@codemzs
Copy link
Member

codemzs commented Dec 31, 2019

@najeeb-kazmi I think the ask here is to create a neural network in ML .NET, we never had this functionality to begin with, what you are referring as made hidden is the retrain API that allows some level of retraining of an existing tensorflow meta graph which is not the same as a creating a NN from scratch as you see people do in pytorch or tensorflow.

@sps014 We do not have any plans in the short term to provide facility to create NN in ML .NET. Your best bet is Tensorflow .NET but I agree it has its own limitations that you have pointed.

@codemzs codemzs closed this as completed Dec 31, 2019
@lostmsu
Copy link

lostmsu commented Mar 19, 2020

@sps014 what kind of performance issues do you experience with SciSharp stack? Curious, because I am working on a competing product, that exposes full TensorFlow Python API to C#: Gradient. With TensorFlow most compute is done inside the core framework, and the language or framework you use to construct TensorFlow computation graph hardly matters.

@codemzs
Copy link
Member

codemzs commented Mar 19, 2020

@lostmsu Your product takes a dependency on python runtime and interops into TF core via its python API. For a C# developer it would mean they will take dependency on .NET runtime and Python runtime. On the other hand TF .NET DOES NOT take a dependency on python runtime. Do you have benchmarks to prove Gradient runs faster than TF .NET?

@sps014
Copy link
Author

sps014 commented Mar 19, 2020

@lostmsu can i use gradient on web i.e. in Blazor ?? Curious to know the size of simple bundled app since python runtime is embedded.

@lostmsu
Copy link

lostmsu commented Mar 19, 2020

@codemzs I never claimed it does 🤔. In fact I said the choice of binding to TensorFlow hardly matters for performance. Just curious what kind of performance problems people experience. Marshaling is the only area, that might have any effect here, and I can hardly see what can beat giving user a writable Span<T> for Tensor contents. Especially given TensorFlow itself does not support (as of 1.13 at least they were not planning to) taking a user-allocated CUDA array as input.

As for the Python dependency, raw TensorFlow binary with GPU support is several times larger, than a Python runtime + all other required dependencies.

@sps014 you would not be able to use it in Blazor, the client-side WASM version, because TensorFlow does not officially support that yet. Server-side I don't see why not. Tech preview might leak memory, but that's going to be solved before RC0 sometime in May.

@codemzs
Copy link
Member

codemzs commented Mar 19, 2020

@lostmsu Ok, I’ll be specific, please provide benchmarks for this claim “choice of binding to TensorFlow hardly matters for performance”

@lostmsu
Copy link

lostmsu commented Mar 19, 2020

@codemzs of course it is an "educated guess". I would say it should be the null-hypothesis given how TensorFlow works. After you build your tf.keras.Model in like 30 sec (most of which is TensorFlow internal initialization, CUDA memory preallocation, etc), on any sufficiently interesting model like ResNet or GPT-2 Model.fit would take hours without ever leaving TensorFlow core code, binding not participating at all, or maybe once a minute to print training statistics.

@sps014
Copy link
Author

sps014 commented Mar 19, 2020

@lostmsu i did same thing , created ML5 NN with C# in Blazor with interop , printing epoch ,or anything slowed it significantly down .

@codemzs
Copy link
Member

codemzs commented Mar 19, 2020

@lostmsu Need benchmarks even for such “educated guesses”. Taking a python runtime dependency is no small thing from performance standpoint. The founder of TensorFlowSharp has himself endorsed TF .NET for a reason. https://twitter.com/migueldeicaza/status/1157385979071778817?s=20

@lostmsu
Copy link

lostmsu commented Mar 19, 2020

@codemzs maybe for the startup time. I'd be more worried about unofficial builds of the TF core C library, because they might have used suboptimal compiler options. Either way, this seems an important question, so I will try to make a trivial benchmark comparison in the coming daylight.

As for the endorsement, I would also prefer TensorFlow.NET to succeed over Gradient's approach.

First, I am too not a fan of Python, even as an intermediate layer. But the reality is such that higher-level algorithms only have implementations in Python in both TensorFlow and PyTorch. And because of that TensorFlow.NET has to play a catch up game with TensorFlow. Given the later is a project, that is probably funded at least several million $ a year (if not 10s of million $), and the former is kept up by a few enthusiasts doing tedious work of translating Python code to C# (hopefully with some tool assistance), it is clear who's winning the race.

Second, unlike Gradient, TF .NET is an open source project. I too would prefer an open-source solution, but I also want to be well compensated for the work, and would rather focus on project features, than on selling consulting on top of it.

@lostmsu
Copy link

lostmsu commented Mar 19, 2020

@codemzs actually, I could see both approaches working together nicely with a pure C# core like TF.NET to do inference, primitive graph building and other trivial stuff, and a binding through Python like Gradient does currently, that could work with pure C# types from the core. But, oh boy, how much work is yet to be done for this to happen.

@lostmsu
Copy link

lostmsu commented Mar 20, 2020

@codemzs basically, the results are exactly what I expected:

TensorFlow .NET

tensorflow/core/platform/cpu_feature_guard.cc:142 Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Training epoch: 1
iter 000: Loss=2.3022, Training Accuracy=14.00% 221ms
iter 100: Loss=0.5189, Training Accuracy=88.00% 3041ms
iter 200: Loss=0.1782, Training Accuracy=95.00% 3021ms
iter 300: Loss=0.1938, Training Accuracy=91.00% 3016ms
iter 400: Loss=0.0924, Training Accuracy=96.00% 3037ms
iter 500: Loss=0.1010, Training Accuracy=98.00% 3017ms
---------------------------------------------------------
Epoch: 1, validation loss: 0.1122, validation accuracy: 96.74%
---------------------------------------------------------
Training epoch: 2
iter 000: Loss=0.1888, Training Accuracy=95.00% 1777ms
iter 100: Loss=0.0528, Training Accuracy=99.00% 3010ms
iter 200: Loss=0.0763, Training Accuracy=98.00% 3045ms
iter 300: Loss=0.0360, Training Accuracy=99.00% 3040ms

Gradient

WARNING:tensorflow:From C:\Users\lost\.conda\envs\tf-1.x-gpu\lib\site-packages\tensorflow_core\python\util\deprecation.py:503: calling argmax (from tensorflow.python.ops.math_ops) with dimension is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
2020-03-19 18:32:46.419558: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-19 18:32:46.444918: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-03-19 18:32:46.448919: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: lost-pc
2020-03-19 18:32:46.452414: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: lost-pc
2020-03-19 18:32:46.455298: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Training epoch: 1
iter 000: Loss=2.3023, Training Accuracy=14.00% 271ms
iter 100: Loss=0.4832, Training Accuracy=89.00% 2479ms
iter 200: Loss=0.3046, Training Accuracy=95.00% 2482ms
iter 300: Loss=0.1041, Training Accuracy=95.00% 2464ms
iter 400: Loss=0.0930, Training Accuracy=96.00% 2471ms
iter 500: Loss=0.1119, Training Accuracy=96.00% 2483ms
---------------------------------------------------------
Epoch: 1, validation loss: 0.1042, validation accuracy: 96.76%
---------------------------------------------------------
Training epoch: 2
iter 000: Loss=0.1296, Training Accuracy=96.00% 1659ms
iter 100: Loss=0.1223, Training Accuracy=98.00% 2546ms
iter 200: Loss=0.0770, Training Accuracy=96.00% 2469ms
iter 300: Loss=0.0419, Training Accuracy=100.00% 2517ms

Source code

Repository has git submodule: https://github.com/losttech/Gradient-Perf

Remarks

This uses SciSharp's own sample code, a version, that works with their latest NuGet packages. Gradient adaptation is in src\Bench.Gradient\DigitRecognitionCNN.cs. It is basically a copy-paste with a few shims and Gradient-specific edits (see file history).

I used TF 1.15.0 build in Gradient. Again, not surprised about TF.NET lagging slightly behind because their official build probably does not set all the right compiler flags that the official Google's build uses. IMHO, nothing, that can't be fixed in a week of work tops, especially if Google cooperates.

Instructions to run (tested on Windows only)

  • clone --recursive
  • create a Conda environment tf-1.x-cpu with Python 3.7
  • in the Conda environment install tensorflow-cpu==1.15.0
  • Launch the solution, switch config to Release
  • In Debug section of Bench.Gradient project properties add GRADIENT_PYTHON_ENVIRONMENT = conda:tf-1.x-cpu environment variable
  • launch Bench.Gradient or Bench.TF.NET without debugging (e.g. Ctrl-F5)

@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants