Skip to content

ML.NET Notes #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ ML.NET.
- Weekly review (Thursdays)
- 2 x four hours with lunch in between
* Review other core assemblies:
- `Microsoft.ML.CpuMath`
- `Microsoft.ML.Core`
- `Microsoft.ML.Data`
- `Microsoft.ML.Transforms`
- `Microsoft.ML.Transforms`
- `Microsoft.ML.CpuMath`
69 changes: 69 additions & 0 deletions 2018/Microsoft.ML.Core/Session2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Microsoft.ML.Core

Status: **Needs more work** |
[API Ref](Microsoft.ML.Core.md) |
[Dependencies](Microsoft.ML_Dependencies.png) |
[Write-up](https://github.com/TomFinley/machinelearning/blob/a4511133d3c0b5bc993af22f09607945e7fdf063/docs/code/Catalog-Core.md)

## Notes

* `Column` can be a struct
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a comment here, to address this and some other commetns.

- Sharing is no longer a concern
* `Schema`
- Should probably have a struct based enumerator
- The value tuples don't play nicely with F#
- We should probably remove them
* `Metadata`
- Key/Value pairs associated with a single column
- Uses `Schema` to describe itself, which is powerful, but might make it
hard for developers to reason about. Can we simplify this? It seems the
answer is no (concerns around ownership as well as benefits that are
gained by uniformity with other parts, such as be able to persist the
schema).
- *Metadata* is a fairly generic term; the entire schema is metadata. Can we
come up with a more specific name? Like annotations?
- The getter seems complicated but it allows them to not keep the metadata
instances around and they allow for easier chaining.
- We could have a helper method or property that gets the value in an easier
way, but it's hard to make the much simpler as they would have to be
generic or the user would have to handle the type
* `IDataView`
- `ISchematized` is being removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific line entered as 1502.

- Should be an abstract class

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1528.

- `GetRowCount()`. `lazy = true` means get if you know it, otherwise don't

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1531.

bother. `lazy = false` means, the caller *really* would like to know, but
implementers can still return `null`.
- Either make it a property `long? Count` or make it
`bool TryGetCount(out long count)`.
- `GetRowCursor()` instead of taking `Func<int, bool>` make it `Predicate<Column>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1529.

which makes it easier to reason about.
- We should look into whether we should have an async version
- We want to get rid of the the `IRowCursorConsolidator` base

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost forgot about this one. Entered as 1867.

* `ICounted`
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the things like this related to IRowCursor I have bundled into a single issue 1532 since I felt like this refactoring is something worth considering collectively, rather than individual items, since many of them are related. This actually includes all the rest of the file up to the ColumnType item.

... Except for the UInt128 issue now that I see it.

- It seems it's not taken by anyone
- It seems it's co-implemented by `IRow` and `ICursor`
- Can this be collapsed?
* `IRowCursor` is needed, but can we get rid of `ICursor`?
* `UInt128` should be something more specific like `RowId`
* `GetRootCursor()` is an performance optimization because many cursors just
delegate. This could also be solved by `GetMover()`. If we converted `ICursor`
to an abstract class, we might be able to hide this in the implementation, or
at least make it protected.
* `MoveMany()` seems like something that could be removed.
* `ICursor.State` seems to be unused, can this be removed?
* `IRow.IsColumnActive` seems to be mostly used in asserts, is this needed? It
seems `GetGetter()` could throw or return `null`.
* `IRowCursor` extending `IRow` seems odd. Can these be collapsed? If not, can
they have an inheritance relationship when we're making them classes?
* `IRowToRowMapper`
- `Action disposer` should probably be replaced by making the right type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions about disposability of IRow (or now Row) were fixed in 1867 so this strangeness around disposing methods has been resolved, thanks!

(`IRow`?) implement `IDisposable`.
- `Func<int, bool>` is hard to get right. Should this be `Predicate<Column>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested this as part of 1529, which I'd mentioned previously.

as well?
* Can we remove `DataKind` entirely? If not, we should probably rename some of the
members, remove the overlap, and add an entry for `0`, like `Unknown` or `None`.
* `ColumnType`
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entered as 1533, with some expansion, please take a look and tell me what you think. This includes a solution to the DataKind problem that I think will work pretty well and at fairly little trouble to anyone...

- We should remove the many `IsXxx` overloads, especially for the esoteric
types.
- It seems this to be heavy context; is it possible to reduce the number
moving pieces, such as removing the concrete derivatives?