You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a very convenient filter, but for NUMERIC values. I’m currently using this for the sample code-snippet
FilterByKeyColumnFraction()
Good for hashed values.
But if we want to filter by text-based columns, let's say I want to remove the rows where a text based column has no value (this might be doable when transforming to numeric values, then missing value is NaN and it will get filtered, but you need an additional not straightforward step) or to remove specific rows when categorical values are equal to "some text", I think we cannot do that, yet.
The text was updated successfully, but these errors were encountered:
This is fine, but I wonder if we can go even a step further and make something generally applicable to many things rather than just something very, very specific to text.
So: data pipelines must be serializable, and prior to ML.NET being open sourced filters were part of data pipelines. Pursuant to #933, it is our position that filters should not be part of data pipelines any longer, hence why we see the old functionality of filters not being exposed as IEstimator/ITransformer, but instead just straight functions on IDataView, e.g.:
The implication means that there is no longer any requirement for things to be serializable, which means we could actually probably simplify a lot of code by deleting practically all of the existing filters, and just replace them with some sort of bool evaluating delegate akin to a LINQ .Where. (We can't serialize delegates, but since we no longer care about that, that's fine.)
Afaik, the new filter APIs can target just numeric-based columns, but not text-based columns.
As of v0.8 we have:
1)
This is a very convenient filter, but for NUMERIC values. I’m currently using this for the sample code-snippet
FilterByKeyColumnFraction()
Good for hashed values.
But if we want to filter by text-based columns, let's say I want to remove the rows where a text based column has no value (this might be doable when transforming to numeric values, then missing value is NaN and it will get filtered, but you need an additional not straightforward step) or to remove specific rows when categorical values are equal to "some text", I think we cannot do that, yet.
The text was updated successfully, but these errors were encountered: