diff --git a/2018/Microsoft.ML.Core/README.md b/2018/Microsoft.ML.Core/Session1.md similarity index 98% rename from 2018/Microsoft.ML.Core/README.md rename to 2018/Microsoft.ML.Core/Session1.md index 805d95d..d4e8e10 100644 --- a/2018/Microsoft.ML.Core/README.md +++ b/2018/Microsoft.ML.Core/Session1.md @@ -102,6 +102,7 @@ ML.NET. - Weekly review (Thursdays) - 2 x four hours with lunch in between * Review other core assemblies: - - `Microsoft.ML.CpuMath` + - `Microsoft.ML.Core` - `Microsoft.ML.Data` - - `Microsoft.ML.Transforms` \ No newline at end of file + - `Microsoft.ML.Transforms` + - `Microsoft.ML.CpuMath` \ No newline at end of file diff --git a/2018/Microsoft.ML.Core/Session2.md b/2018/Microsoft.ML.Core/Session2.md new file mode 100644 index 0000000..d4ac5d9 --- /dev/null +++ b/2018/Microsoft.ML.Core/Session2.md @@ -0,0 +1,69 @@ +# Microsoft.ML.Core + +Status: **Needs more work** | +[API Ref](Microsoft.ML.Core.md) | +[Dependencies](Microsoft.ML_Dependencies.png) | +[Write-up](https://github.com/TomFinley/machinelearning/blob/a4511133d3c0b5bc993af22f09607945e7fdf063/docs/code/Catalog-Core.md) + +## Notes + +* `Column` can be a struct + - Sharing is no longer a concern +* `Schema` + - Should probably have a struct based enumerator + - The value tuples don't play nicely with F# + - We should probably remove them +* `Metadata` + - Key/Value pairs associated with a single column + - Uses `Schema` to describe itself, which is powerful, but might make it + hard for developers to reason about. Can we simplify this? It seems the + answer is no (concerns around ownership as well as benefits that are + gained by uniformity with other parts, such as be able to persist the + schema). + - *Metadata* is a fairly generic term; the entire schema is metadata. Can we + come up with a more specific name? Like annotations? + - The getter seems complicated but it allows them to not keep the metadata + instances around and they allow for easier chaining. + - We could have a helper method or property that gets the value in an easier + way, but it's hard to make the much simpler as they would have to be + generic or the user would have to handle the type +* `IDataView` + - `ISchematized` is being removed + - Should be an abstract class + - `GetRowCount()`. `lazy = true` means get if you know it, otherwise don't + bother. `lazy = false` means, the caller *really* would like to know, but + implementers can still return `null`. + - Either make it a property `long? Count` or make it + `bool TryGetCount(out long count)`. + - `GetRowCursor()` instead of taking `Func` make it `Predicate` + which makes it easier to reason about. + - We should look into whether we should have an async version + - We want to get rid of the the `IRowCursorConsolidator` base +* `ICounted` + - It seems it's not taken by anyone + - It seems it's co-implemented by `IRow` and `ICursor` + - Can this be collapsed? +* `IRowCursor` is needed, but can we get rid of `ICursor`? +* `UInt128` should be something more specific like `RowId` +* `GetRootCursor()` is an performance optimization because many cursors just + delegate. This could also be solved by `GetMover()`. If we converted `ICursor` + to an abstract class, we might be able to hide this in the implementation, or + at least make it protected. +* `MoveMany()` seems like something that could be removed. +* `ICursor.State` seems to be unused, can this be removed? +* `IRow.IsColumnActive` seems to be mostly used in asserts, is this needed? It + seems `GetGetter()` could throw or return `null`. +* `IRowCursor` extending `IRow` seems odd. Can these be collapsed? If not, can + they have an inheritance relationship when we're making them classes? +* `IRowToRowMapper` + - `Action disposer` should probably be replaced by making the right type + (`IRow`?) implement `IDisposable`. + - `Func` is hard to get right. Should this be `Predicate` + as well? +* Can we remove `DataKind` entirely? If not, we should probably rename some of the + members, remove the overlap, and add an entry for `0`, like `Unknown` or `None`. +* `ColumnType` + - We should remove the many `IsXxx` overloads, especially for the esoteric + types. + - It seems this to be heavy context; is it possible to reduce the number + moving pieces, such as removing the concrete derivatives? \ No newline at end of file