Skip to content

Error due to ShuffleTransform in pipeline. #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zeahmed opened this issue Oct 1, 2018 · 5 comments · Fixed by #1208
Closed

Error due to ShuffleTransform in pipeline. #1106

zeahmed opened this issue Oct 1, 2018 · 5 comments · Fixed by #1208

Comments

@zeahmed
Copy link
Contributor

zeahmed commented Oct 1, 2018

Need to convert ShuffleTransform into Transformer/Estimator design. Currently, when it is used in TensorFlowTransform to enable shuffling of data during training, it gives error regarding ShuffleTransform is not RowToRowMapper.

Update

It seems like the transform works fine during training. The error is somewhere when creating prediction engine.

The following is the exact location where assertion fails during creation of prediction engine.

_host.Assert(xf is IRowToRowMapper);

Here is the test failure log

[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8050239]         E:\TLC_git\machinelearning\test\Microsoft.ML.TestFramework\GlobalBase.cs(70,0): at Microsoft.ML.Runtime.Internal.Internallearn.Test.GlobalBase.AssertHandler(String msg, IExceptionContext ectx)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8060923]         E:\TLC_git\machinelearning\src\Microsoft.ML.Core\Utilities\Contracts.cs(777,0): at Microsoft.ML.Runtime.Contracts.DbgFailCore(String msg, IExceptionContext ctx)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8064869]         E:\TLC_git\machinelearning\src\Microsoft.ML.Core\Utilities\Contracts.cs(786,0): at Microsoft.ML.Runtime.Contracts.DbgFail(IExceptionContext ctx)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8067575]         E:\TLC_git\machinelearning\src\Microsoft.ML.Core\Utilities\Contracts.cs(841,0): at Microsoft.ML.Runtime.Contracts.Assert(IExceptionContext ctx, Boolean f)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8070263]         E:\TLC_git\machinelearning\src\Microsoft.ML.Data\DataLoadSave\TransformWrapper.cs(133,0): at Microsoft.ML.Runtime.Data.TransformWrapper.GetRowToRowMapper(ISchema inputSchema)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8073429]         E:\TLC_git\machinelearning\src\Microsoft.ML.Api\PredictionEngine.cs(190,0): at Microsoft.ML.Runtime.Api.PredictionEngine`2..ctor(IHostEnvironment env, Func`2 makeMapper, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8076450]         E:\TLC_git\machinelearning\src\Microsoft.ML.Api\PredictionEngine.cs(172,0): at Microsoft.ML.Runtime.Api.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8078831]         E:\TLC_git\machinelearning\src\Microsoft.ML.Api\PredictionEngine.cs(166,0): at Microsoft.ML.Runtime.Api.PredictionEngine`2..ctor(IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8081564]         E:\TLC_git\machinelearning\src\Microsoft.ML.Api\ComponentCreation.cs(190,0): at Microsoft.ML.Runtime.Api.ComponentCreation.CreatePredictionEngine[TSrc,TDst](IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8084805]         E:\TLC_git\machinelearning\test\Microsoft.ML.Tests\ScenariosWithDirectInstantiation\TensorflowTests.cs(438,0): at Microsoft.ML.Scenarios.ScenariosTests.ExecuteTFTransformMNISTLRTrainingTest(Boolean shuffle, Nullable`1 shuffleSeed, Double expectedMicroAccuracy, Double expectedMacroAccruacy)
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8087841]         E:\TLC_git\machinelearning\test\Microsoft.ML.Tests\ScenariosWithDirectInstantiation\TensorflowTests.cs(365,0): at Microsoft.ML.Scenarios.ScenariosTests.TensorFlowTransformMNISTLRTrainingTest()
[10/9/2018 4:18:27 PM Informational] [xUnit.net 00:00:10.8586866]   Finished:    Microsoft.ML.Tests
@Zruty0
Copy link
Contributor

Zruty0 commented Oct 2, 2018

It should not give this error during training, only during scoring, right?
And is it necessary to have the shuffle transform when you are done training?

@TomFinley
Copy link
Contributor

Hi @zeahmed , different objection... Shuffle transform simply cannot be a row to row mapper. The two notions are incompatible. One is based on the idea that it is applicable to situations where an operation can be applied as an operation of one row to another row. One has to do with the permutation of the rows themselves. So, cannot be done, sorry. Perhaps if you were to describe your scenario, we could say what the actual problem is?

@zeahmed zeahmed changed the title Convert ShuffleTransform to RowToRowMapper... Convert ShuffleTransform to Transformer/Estimator design... Oct 2, 2018
@zeahmed
Copy link
Contributor Author

zeahmed commented Oct 2, 2018

@Zruty0, I will copy exact exception. It is applied during training. I have a parameter to set shuffle on/off during training. Its a good practice to shuffle the training data before each iteration to reduce variance and skip local minimum.

@TomFinley, yes the title got wrong. It should be "Convert ShuffleTransform to Transformer/Estimator". The idea is to have option for user to shuffle the data on each iteration i.e. whenever I open a new cursor on dataview I should get the data in different ordering. I have done it previously with direct instantiation. Since the design has changed I think it needs conversion or there should be some other way to used it.

Let me know if you guys know the other way to use it?

@TomFinley
Copy link
Contributor

TomFinley commented Oct 2, 2018

Ah. Thanks @zeahmed ! Then I have another objection, these things should just not be estimators/transformers. I feel pretty strongly about this: See #933 . This has been the cause of a large amount of suffering in the past. It should still exist, but purely an operation over IDataView, not something that becomes part of a data model (which a transform is).

@zeahmed
Copy link
Contributor Author

zeahmed commented Oct 2, 2018

So the question is; does this or other similar transforms need conversion? if not are they supposed to work as-is? or this is the work in progress?

@zeahmed zeahmed assigned zeahmed and unassigned zeahmed Oct 9, 2018
@zeahmed zeahmed changed the title Convert ShuffleTransform to Transformer/Estimator design... Error due to ShuffleTransform to in pipeline. Oct 12, 2018
@zeahmed zeahmed changed the title Error due to ShuffleTransform to in pipeline. Error due to ShuffleTransform in pipeline. Oct 12, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants