Skip to content

Don't create parallel cursor if we have only one element in dataview #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

related to #179

@@ -143,7 +143,7 @@ public override IRowCursor[] GetRowCursorSet(out IRowCursorConsolidator consolid
var inputs = Source.GetRowCursorSet(out consolidator, predicateInput, n, rand);
Contracts.AssertNonEmpty(inputs);

if (inputs.Length == 1 && n > 1 && WantParallelCursors(predicate))
if (inputs.Length == 1 && n > 1 && WantParallelCursors(predicate) && Source.GetRowCount() != 1)
Copy link
Contributor

@glebuk glebuk May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= 1 [](start = 101, length = 5)

<=n perhaps- no point of setting up N pipelines for <n items. The overhead is too high. #Closed

@@ -143,7 +143,7 @@ public override IRowCursor[] GetRowCursorSet(out IRowCursorConsolidator consolid
var inputs = Source.GetRowCursorSet(out consolidator, predicateInput, n, rand);
Contracts.AssertNonEmpty(inputs);

if (inputs.Length == 1 && n > 1 && WantParallelCursors(predicate))
if (inputs.Length == 1 && n > 1 && WantParallelCursors(predicate) && Source.GetRowCount() != 1)
Copy link
Contributor

@glebuk glebuk May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRowCount [](start = 88, length = 11)

However, the issue with this solution is that if user sets up a true enumerable, we will do a parallel clustering setup. At least emprirically it appears that setting up parallel cursors for scoring is 10k x slower vs not. Why we even doing this at all? The only time it will be worth doing if we are having on the order of 10k items to score. #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know size of Source in general, so nothing we can do. At least with current interfaces. I guess I can propagate parameter which would forbid usage of parallel cursor, but's it's a lot of changes.
Plus in current situation, we are forcing data to be array

foreach (var item in _predictor.Predict(new[] { input }, reuseRowObjects: false))


In reply to: 189992635 [](ancestors = 189992635)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically that array can possibly be recycled.
Longer term we should use the row to row mappers that Yael has developed.


In reply to: 190086316 [](ancestors = 190086316,189992635)

@@ -432,6 +432,8 @@ public override bool CanShuffle

public override long? GetRowCount(bool lazy = true)
{
if (_data is ICollection<TRow> collection)
return collection.Count;
return null;
Copy link
Contributor

@TomFinley TomFinley May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Might be simpler as return (_data as ICollection<TRow>)?.Count;. #Resolved

Copy link
Contributor

@glebuk glebuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ivanidzo4ka -- this strikes me as a positive change, not the final word on this subject I'm sure but it at least partially solves the most important subcase.

@Ivanidzo4ka Ivanidzo4ka merged commit b6e9e74 into dotnet:master May 24, 2018
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
…otnet#197)

* Don't create parallel cursor if we have only one element in dataview

* update for comments
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants