-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Reduce input stream buffer from 8KB to 2KB #1881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dimitris-athanasiou
merged 1 commit into
elastic:master
from
dimitris-athanasiou:reduce-input-buffer-size-to-2kb
Apr 28, 2021
Merged
[ML] Reduce input stream buffer from 8KB to 2KB #1881
dimitris-athanasiou
merged 1 commit into
elastic:master
from
dimitris-athanasiou:reduce-input-buffer-size-to-2kb
Apr 28, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated.
edsavage
approved these changes
Apr 28, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this pull request
Apr 28, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes elastic#70698
dimitris-athanasiou
added a commit
to dimitris-athanasiou/ml-cpp
that referenced
this pull request
Apr 28, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of elastic#1881
dimitris-athanasiou
added a commit
to dimitris-athanasiou/ml-cpp
that referenced
this pull request
Apr 28, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of elastic#1881
dimitris-athanasiou
added a commit
to dimitris-athanasiou/ml-cpp
that referenced
this pull request
Apr 28, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of elastic#1881
dimitris-athanasiou
added a commit
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of #1881
dimitris-athanasiou
added a commit
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of #1881
dimitris-athanasiou
added a commit
to elastic/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes #70698
dimitris-athanasiou
added a commit
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of #1881
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes elastic#70698 Backport of elastic#72412
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes elastic#70698 Backport of elastic#72412
dimitris-athanasiou
added a commit
to elastic/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes #70698 Backport of #72412
dimitris-athanasiou
added a commit
to elastic/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes #70698 Backport of #72412
dimitris-athanasiou
added a commit
to elastic/elasticsearch
that referenced
this pull request
Apr 29, 2021
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. In elastic/ml-cpp#1881 the buffer has been reduced to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. Thus in this commit we also reduce the number of spaces we write in order to flush the buffer to match that of the buffer size. Closes #70698 Backport of #72412
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we flush the input stream the java side writes
enough spaces to fill the input stream buffer.
However, in the case of data frame analytics, this may
cause the job to freeze. The reason is that java
writes the data and flushes in the same thread that
goes on to then restore the state. However, when
c++ reads in the end-of-data control message, it stops
reading from the stream and goes on to perform the analysis.
If the 8KB of spaces do not fit in the OS buffer for the
names pipe, the java side blocks. It never proceeds with
restoring the state and this causes a job that is
being restarted and has state to freeze.
This commit reduces the buffer to 2KB. This means the
buffer is smaller than the buffer of all supported OS.
Note that it is 4KB on Windows. I have performed
a test to assess performace impact. In 10 runs of
a job that processes ~1M docs with a total input size
of 62MB, the result was the same both with the buffer
being 8KB and 2KB. The result supports that there is
no significant impact on performance.
Also note an alternative solution would have been to
use an additional thread to keep reading the input stream.
However, this solution seems to be much less complicated.