-
Notifications
You must be signed in to change notification settings - Fork 64
[ML] Add state format version for resilience #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
errorInBayesianOptimisationState.flush(); | ||
|
||
bool throwsExceptions{false}; | ||
std::stringstream buffer; | ||
std::streambuf* old = std::cerr.rdbuf(buffer.rdbuf()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should intercept logs like this:
- It's making the assumption that default logging goes through
std::cerr
, which is a breach of encapsulation. For example, I was wondering if I should change it to go throughstd::clog
so that it's buffered. - It's setting the buffer to a local variable which means that if the corresponding
std::cerr.rdbuf(old)
doesn't get run for some reason then the test suite will core dump in the next test. And when that happens it will be much more work to figure out what went wrong than looking at a nicely formatted list of test failures in Jenkins.
There's an example of how we've previously asserted on which errors are logged in CPoissonMeanConjugateTest::testSampleMarginalLikelihoodInSupportBounds
. You could even use the same logging config file in your test as it's also in the maths
unit tests.
If you don't like this approach then after 7.5 feature freeze I'd be happy for you to open another PR to add a method to CLogger
to tell it to log to a supplied stream instead of the current logging location and then change the unit tests that use the current approach over to that. We could have a catch (...)
in the tests that use that to ensure the logger is reset before the test exits for any reason. But please use the current approach for this PR and then do that change (if you think it's worthwhile) separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out, @droberts195. I adjusted the test. Could you please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the change.
I think you should call ml::core::CLogger::instance().reset()
at the end of the test so that we don't have every subsequent test also logging to the file, at least in the case where it succeeds. It's not fatal for every subsequent test to log to the file, but might cause confusion one day so it's best that we don't in the common case.
But if you add that line then the PR LGTM, so go ahead and merge without further review.
This PR adds the version number of the current release (7.5) to the format of the training state using for resilience. In the case of the version mismatch, the deserialization of the training state will fail and training will start from scratch. I also took an opportunity to refactor the test for persistence state error handling to have a more granular control of the reasons for failure.
This PR adds the version number of the current release (7.5) to the format of the training state using for resilience. In the case of the version mismatch, the deserialization of the training state will fail and training will start from scratch. I also took an opportunity to refactor the test for persistence state error handling to have a more granular control of the reasons for failure.
This PR adds the version number of the current release (7.5) to the format of the training state using for resilience. In the case of the version mismatch, the deserialization of the training state will fail and training will start from scratch.
I also took an opportunity to refactor the test for persistence state error handling to have a more granular control of the reasons for failure.