-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Easyer way to create dynamic DataViews #5895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@luisquintanilla @briacht I have seen this come up several times now, so its obviously something people want. Not sure how it will fit with our priorities, but its definitely something we should look more into. @vgb1993 could you give an example of what you would like to see/what you are thinking? |
Yes, I've seen this request quite a few times now too! I think this would be good to investigate |
Here's some raw brainstorming:
Any thoughts? Any preferences? Any draw backs? I'm not aware of the internal implementation of ML.Net so perhaps someone could have better ideas. At the end of the day what we want is to create and run ML models at runtime. If we can define a dynamic model we can build a software to make it work. Wich ultimately makes ML.Net more accessible. |
Yes! This problem I encountered today. For @vgb1993's points above, 1 - Yes Others are maybe/sure. |
Currently the easiest way to do a dynamic dataview is using the Microsoft.Data.Analysis.DataFrame because it can dynamically load in a text file and create the schema automatically and then use that in ML.NET. Take a look at this for an example. That being said, some of the other approaches mentioned above are things we are considering, but don't have a timeline for them as of yet. |
@michaelgsharp, I do agree that a Are you aware of any partitioning work on dataframes such as the Python dask library for pandas? |
Personally I am not but I haven't really looked into it much. @luisquintanilla are you aware of anything? Yeah, the memory thing can be an issue. I'm not aware of a workaround for now, but this is something we are keeping in mind for future work. |
@michaelgsharp we have same issue with dynamic data loading (mostly from SQL db), and dynamicaly creating models for each labeling/value prediction scenario (I guess this is main problem, we need separate model for each scenario). Here ideas so far
I guess all of mentioned should work but all of them are ridiculous... |
I was hoping this would have been added to 2.0.0. I'm currently using one huge class that contains all my potential features, loading my data from stored procedures and then using ML.Transforms.DropColumns to remove the fields not found before training. It works, but it's far from ideal, and has become stifling. Has anyone found a better work around? |
I am using only Microsoft.AutoML and I am able to do workaroud by load data dynamically with loading from SQL, and using input columns as input/labels,column names. And loading IDataView from normal SQL query (with mapping SQL columns to ML.NET data type columns for IDataView) For prediction just need to match datatypes (fake 1 row SQL SELECT statement) Works for training AutoML API training and prediction. |
Not sure how many people struggled with this but i managed to solve one of the RunTime type problems without having to use json or csv or textfiles as my DataView, thus a RunTime dynamic DataView of a live IEnumerable of objects. A similar approach could probably be taken to just create objects from .csv files etc and apply schema properties dynamically but I think ML.Net has that built in now. |
Is your feature request related to a problem? Please describe.
In my company we want add ML blocks to our arsenal (made with Blockly) with witch you could train and run models. I've read in the docs and some issues that the model must be defined beforehand declaring a Class with some Attributes. And apparently it's not easy to create a dynamic model. Our idea is to feed SQL DataSets to the Trainer.
Describe the solution you'd like
As a user I would like to define the model based on the shape of the input data. For example, a SQL DataSet, a CSV etc.
After that, each column metadata could be added programatically.
Describe alternatives you've considered
Both seem overcomplicated to me:
https://stackoverflow.com/questions/56761728/add-custom-column-to-idataview-in-ml-net
https://stackoverflow.com/questions/66893993/ml-net-create-prediction-engine-using-dynamic-class/66913705#66913705
Additional context
Experienced c# developer, new to ML.NET
The text was updated successfully, but these errors were encountered: