Skip to content

[7.x] [ML] Fixing inference stats race condition (#55163) #55486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2020

Conversation

benwtrent
Copy link
Member

Backports the following commits to 7.x:

`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.

To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.

When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.

This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.

I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).

closes  elastic#54786
@benwtrent benwtrent added :ml Machine learning backport labels Apr 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent merged commit cabff65 into elastic:7.x Apr 20, 2020
@benwtrent benwtrent deleted the backport/7.x/pr-55163 branch April 20, 2020 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants