Skip to content

Add cancellation checkpoint in logistic regression. #3032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 22, 2019

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Mar 20, 2019

fixes #3031

Please read the issue before reviewing this PR.

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #3032 into master will increase coverage by 0.09%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3032      +/-   ##
==========================================
+ Coverage   72.41%    72.5%   +0.09%     
==========================================
  Files         803      804       +1     
  Lines      143851   144080     +229     
  Branches    16173    16179       +6     
==========================================
+ Hits       104171   104467     +296     
+ Misses      35258    35197      -61     
+ Partials     4422     4416       -6
Flag Coverage Δ
#Debug 72.5% <100%> (+0.09%) ⬆️
#production 68.15% <100%> (+0.06%) ⬆️
#test 88.69% <ø> (+0.08%) ⬆️
Impacted Files Coverage Δ
.../Standard/LogisticRegression/LbfgsPredictorBase.cs 71.33% <100%> (+0.06%) ⬆️
...crosoft.ML.StandardTrainers/Optimizer/Optimizer.cs 73.41% <100%> (+0.07%) ⬆️
...soft.ML.Data/DataView/DataViewConstructionUtils.cs 85.27% <0%> (-0.9%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <0%> (ø) ⬆️
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 73.23% <0%> (ø) ⬆️
...osoft.ML.Data/DataView/InternalSchemaDefinition.cs 56.94% <0%> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.68% <0%> (ø) ⬆️
...osoft.ML.Functional.Tests/SchemaDefinitionTests.cs 98.46% <0%> (ø)
test/Microsoft.ML.Tests/Scenarios/Api/TestApi.cs 97.63% <0%> (+0.01%) ⬆️
... and 11 more

@codemzs codemzs requested a review from wschin March 20, 2019 17:37
@rogancarr
Copy link
Contributor

What's the performance implications here?

@@ -475,6 +475,7 @@ private protected virtual void TrainCore(IChannel ch, RoleMappedData data)
e => e.SetProgress(0, exCount, totalCount));
while (cursor.MoveNext())
{
Host.CheckAlive();
WeightSum += cursor.Weight;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's not the only place we need a check point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added one more in Line Search Minimize function.

@wschin
Copy link
Member

wschin commented Mar 21, 2019

        }

This looks very suspicious. Could you add some check points to this function? I also feel we need to perf algs before adding checking points.


Refers to: src/Microsoft.ML.StandardTrainers/Standard/LogisticRegression/LbfgsPredictorBase.cs:567 in 5540101. [](commit_id = 5540101, deletion_comment = False)

@@ -475,6 +475,7 @@ private protected virtual void TrainCore(IChannel ch, RoleMappedData data)
e => e.SetProgress(0, exCount, totalCount));
while (cursor.MoveNext())
{
Host.CheckAlive();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host.CheckAlive(); [](start = 18, length = 20)

is it too much to do it in every row fetch? would it be enough to do it every 10 cursor moves, or some other number > 1.
(idk if there are any best practiced on how to determine the frequency of checks , from maybe the CancellationToken implementations)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how is that any more efficient than what we have now? You will end up executing an if condition on every row fetch ... based on my analysis of the current solution this doesn’t add any significant overhead.

Cancellation token works differently. You register a callback with it and when a signal is sent it invokes the callback and you do work to gracefully shutdown a process. Our plan is to implement cancellation token post 1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke offline. I think this is the best we can do until we get cancellation tokens into the mix. CheckAlive only checks a bool property, so it's probably faster than checking to see if it's the 10th iteration or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some late flavor to this, the branch predictor should be slightly better at the (almost perfectly) constant bool property, than the return value from iteration % 10 (including the hidden division operation..could check every 8 as the compiler should optimize to a bitwise AND).

That said, there's the overhead of the CheckAlive() function call which maybe greater if not inlined.

@codemzs codemzs merged commit b6c5b70 into dotnet:master Mar 22, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cancellation checkpoints in LogisticRegression
5 participants