-
Notifications
You must be signed in to change notification settings - Fork 41
ML.NET Notes #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML.NET Notes #74
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Microsoft.ML.Core | ||
|
||
Status: **Needs more work** | | ||
[API Ref](Microsoft.ML.Core.md) | | ||
[Dependencies](Microsoft.ML_Dependencies.png) | | ||
[Write-up](https://github.com/TomFinley/machinelearning/blob/a4511133d3c0b5bc993af22f09607945e7fdf063/docs/code/Catalog-Core.md) | ||
|
||
## Notes | ||
|
||
* `Column` can be a struct | ||
- Sharing is no longer a concern | ||
* `Schema` | ||
- Should probably have a struct based enumerator | ||
- The value tuples don't play nicely with F# | ||
- We should probably remove them | ||
* `Metadata` | ||
- Key/Value pairs associated with a single column | ||
- Uses `Schema` to describe itself, which is powerful, but might make it | ||
hard for developers to reason about. Can we simplify this? It seems the | ||
answer is no (concerns around ownership as well as benefits that are | ||
gained by uniformity with other parts, such as be able to persist the | ||
schema). | ||
- *Metadata* is a fairly generic term; the entire schema is metadata. Can we | ||
come up with a more specific name? Like annotations? | ||
- The getter seems complicated but it allows them to not keep the metadata | ||
instances around and they allow for easier chaining. | ||
- We could have a helper method or property that gets the value in an easier | ||
way, but it's hard to make the much simpler as they would have to be | ||
generic or the user would have to handle the type | ||
* `IDataView` | ||
- `ISchematized` is being removed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This specific line entered as 1502. |
||
- Should be an abstract class | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line item entered as 1528. |
||
- `GetRowCount()`. `lazy = true` means get if you know it, otherwise don't | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line item entered as 1531. |
||
bother. `lazy = false` means, the caller *really* would like to know, but | ||
implementers can still return `null`. | ||
- Either make it a property `long? Count` or make it | ||
`bool TryGetCount(out long count)`. | ||
- `GetRowCursor()` instead of taking `Func<int, bool>` make it `Predicate<Column>` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line item entered as 1529. |
||
which makes it easier to reason about. | ||
- We should look into whether we should have an async version | ||
- We want to get rid of the the `IRowCursorConsolidator` base | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Almost forgot about this one. Entered as 1867. |
||
* `ICounted` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Many of the things like this related to ... Except for the |
||
- It seems it's not taken by anyone | ||
- It seems it's co-implemented by `IRow` and `ICursor` | ||
- Can this be collapsed? | ||
* `IRowCursor` is needed, but can we get rid of `ICursor`? | ||
* `UInt128` should be something more specific like `RowId` | ||
* `GetRootCursor()` is an performance optimization because many cursors just | ||
delegate. This could also be solved by `GetMover()`. If we converted `ICursor` | ||
to an abstract class, we might be able to hide this in the implementation, or | ||
at least make it protected. | ||
* `MoveMany()` seems like something that could be removed. | ||
* `ICursor.State` seems to be unused, can this be removed? | ||
* `IRow.IsColumnActive` seems to be mostly used in asserts, is this needed? It | ||
seems `GetGetter()` could throw or return `null`. | ||
* `IRowCursor` extending `IRow` seems odd. Can these be collapsed? If not, can | ||
they have an inheritance relationship when we're making them classes? | ||
* `IRowToRowMapper` | ||
- `Action disposer` should probably be replaced by making the right type | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Questions about disposability of |
||
(`IRow`?) implement `IDisposable`. | ||
- `Func<int, bool>` is hard to get right. Should this be `Predicate<Column>` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggested this as part of 1529, which I'd mentioned previously. |
||
as well? | ||
* Can we remove `DataKind` entirely? If not, we should probably rename some of the | ||
members, remove the overlap, and add an entry for `0`, like `Unknown` or `None`. | ||
* `ColumnType` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Entered as 1533, with some expansion, please take a look and tell me what you think. This includes a solution to the |
||
- We should remove the many `IsXxx` overloads, especially for the esoteric | ||
types. | ||
- It seems this to be heavy context; is it possible to reduce the number | ||
moving pieces, such as removing the concrete derivatives? |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a comment here, to address this and some other commetns.