Skip to content

Commit bafac5e

Browse files
[7.12][ML] Reduce input stream buffer from 8KB to 2KB (elastic#1881)
When we flush the input stream the java side writes enough spaces to fill the input stream buffer. However, in the case of data frame analytics, this may cause the job to freeze. The reason is that java writes the data and flushes in the same thread that goes on to then restore the state. However, when c++ reads in the end-of-data control message, it stops reading from the stream and goes on to perform the analysis. If the 8KB of spaces do not fit in the OS buffer for the names pipe, the java side blocks. It never proceeds with restoring the state and this causes a job that is being restarted and has state to freeze. This commit reduces the buffer to 2KB. This means the buffer is smaller than the buffer of all supported OS. Note that it is 4KB on Windows. I have performed a test to assess performace impact. In 10 runs of a job that processes ~1M docs with a total input size of 62MB, the result was the same both with the buffer being 8KB and 2KB. The result supports that there is no significant impact on performance. Also note an alternative solution would have been to use an additional thread to keep reading the input stream. However, this solution seems to be much less complicated. Backport of elastic#1881
1 parent 1ed060a commit bafac5e

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

lib/api/CLengthEncodedInputParser.cc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,12 @@ namespace ml {
2626
namespace api {
2727

2828
// Initialise statics
29-
const std::size_t CLengthEncodedInputParser::WORK_BUFFER_SIZE{8192}; // 8kB
29+
30+
// We keep the buffer at 2KB because it has to be smaller
31+
// than the OS buffer for the named pipes (smallest is windows at 4KB).
32+
// This allows flushing the buffer from the java side by sending spaces
33+
// without the risk of blocking the thread that writes to the process.
34+
const std::size_t CLengthEncodedInputParser::WORK_BUFFER_SIZE{2048}; // 2kB
3035

3136
CLengthEncodedInputParser::CLengthEncodedInputParser(std::istream& strmIn)
3237
: CLengthEncodedInputParser{TStrVec{}, strmIn} {

0 commit comments

Comments
 (0)