You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am sorry if this was answered, but the closest I could find is this: #192
Which looks to have an answer of specifying all the related files. var data = reader.Read(exampleFile1, exampleFile2);
The other tutorials on Microsoft all used a single file.
I am looking at something to examine about 14 Terabytes worth of data within hundreds of thousands of files across multiple hard drives. Because of the size there would not really be any way to store that in memory either.
Could I use ML.NET for this problem?
The text was updated successfully, but these errors were encountered:
@ShinobiWannabe , there is no principled limitation for this type of scenario, as long as you don't try to use an in-memory operation (like training FastTree/LightGBM, for example).
However, there are 2 difficulties I envision right now:
We don't have a standard component that can read from multiple files. You could try to create new MultiFileSource("file1.txt+file2.txt+file3.txt") (the + syntax used to work back in the day, and I'm not sure if we still support this capability. If that doesn't work, you'd need to implement your own IMultiStreamSource, which is a bit involved.
Our learners, when they deem that they benifit from caching the data, cache the data in memory (without the possibility to opt out). This is a bug tracked by Clean up our auto-caching #1604
I am sorry if this was answered, but the closest I could find is this:
#192
Which looks to have an answer of specifying all the related files.
var data = reader.Read(exampleFile1, exampleFile2);
The other tutorials on Microsoft all used a single file.
I am looking at something to examine about 14 Terabytes worth of data within hundreds of thousands of files across multiple hard drives. Because of the size there would not really be any way to store that in memory either.
Could I use ML.NET for this problem?
The text was updated successfully, but these errors were encountered: