Provide a way to append\concatentate multiple IDataViews #4005
Labels
enhancement
New feature or request
P2
Priority of the issue for triage purpose: Needs to be fixed at some point.
System information
Issue
There should be a way to append or concatenate multiple IDataViews together.
Here's the scenario:
The new ranking sample needs the ability to train the model using two datasets that are each loaded from a separate text file and have the same schema - specifically, there is a (1) Training dataset and (2) Validation dataset, that need to be combined. For example, refer to step #3 in the steps outlined below which the sample is based on.
Here's the steps shown in the sample - generally, the pattern to train, validate, and test a model includes the following steps:
Today to achieve this, the sample has to first load the data from a text file, then create an enumerable so that the datasets can be concatenated - this process would be greatly simplified if you could append/concatenate two IDataViews together:
NOTE: I also considered creating a text loader to load multiple text files (as described [here])(https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.textloader.load?view=ml-dotnet#Microsoft_ML_Data_TextLoader_Load_Microsoft_ML_Data_IMultiStreamSource_); however, one of the data files included a header while the other didn't. It looks like to create a TextLoader for multiple files, that the file headers must be consistent across files.
Source code / logs
Note that there is a method today that provides the ability to append rows - we should consider exposing this publicly:
machinelearning/src/Microsoft.ML.Data/DataView/AppendRowsDataView.cs
Lines 23 to 31 in 70ef7ec
The text was updated successfully, but these errors were encountered: