Skip to content

Trivial estimators should also be ITransformer #2354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
artidoro opened this issue Jan 31, 2019 · 1 comment
Open

Trivial estimators should also be ITransformer #2354

artidoro opened this issue Jan 31, 2019 · 1 comment
Assignees
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@artidoro
Copy link
Contributor

artidoro commented Jan 31, 2019

We are making constructors of transformers internal as part of issue #1798. This makes the creation of non trainable transformers somewhat clumsy. This problem is already reported in the following issue #2165 by an external user.
Here is an example:

IDataView data = ...
var estimator = mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel");
var transformer = estimator.Fit(ANY DATA);
var transformedData = transformer.Transform(data);

The problem here is the step where the call to Fit() is completely useless and feels silly. However, we need estimators for non trainable transformers to use them in estimator pipelines, and the estimator interface cannot change.

One possible approach that we came up with @glebuk and @TomFinley is to make the 'trivial' estimators (estimators for non trainable transformers) ITransformers themselves. This is how the API would look like if we make this change, you can see that it is still possible to make the trivial estimators part of a pipeline:

IDataView data = ...
var estimatorTransformer = mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel");
var transformedData = estimatorTransformer.Transform(data);

var pipeline = estimatorTransformer.Append(ml.Regression.Learners.FastTree(options);
var trainedPipeline = pipeline.Fit(data);
var transformedData2 = trainedPipeline.Transform(data);

The conversion would be very simple. The estimator will have a field ITransformer Transformer which is instantiated in the constructor using the arguments passed to the constructor of the estimator (this is already done in cases where the estimator derive from TrivialEstimator which should be the case for non trainable estimators). The methods of the interface ITransformer will be implemented by calling the respective methods of Transformer. A sample conversion should look like the following:

public sealed class KeyToValueMappingEstimator : TrivialEstimator<KeyToValueMappingTransformer>, ITransformer
    {
        public KeyToValueMappingEstimator(IHostEnvironment env, string columnName)
            : base(Contracts.CheckRef(env, nameof(env)).Register(nameof(KeyToValueMappingEstimator)), new KeyToValueMappingTransformer(env, columnName))
        {        }

        public KeyToValueMappingEstimator(IHostEnvironment env, params (string outputColumnName, string inputColumnName)[] columns)
            : base(Contracts.CheckRef(env, nameof(env)).Register(nameof(KeyToValueMappingEstimator)), new KeyToValueMappingTransformer(env, columns))
        {        }

        public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
        {            ...        }

        public bool IsRowToRowMapper => Transformer.IsRowToRowMapper;
        public Schema GetOutputSchema(Schema inputSchema) => Transformer.GetOutputSchema(inputSchema);
        public IRowToRowMapper GetRowToRowMapper(Schema inputSchema) => Transformer.GetRowToRowMapper(inputSchema);
        public IDataView Transform(IDataView input) => Transformer.Transform(input);
    }
}

/cc @Ivanidzo4ka

@artidoro artidoro added the API Issues pertaining the friendly API label Jan 31, 2019
@artidoro
Copy link
Contributor Author

artidoro commented Feb 1, 2019

There is an even better way! We can make TrivialEstimator a ITransformer, that way all the estimators that use it as base class will themselves directly become ITransofrmer.

We would need to check which estimators are trivial, but were not made derive from TrivialEstimator but this is a good start I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

2 participants