-
Notifications
You must be signed in to change notification settings - Fork 439
Speed up LineReader::next_line #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Measuring with a large numbers only CSV, this change showed ~25% improvement in reading speed.
I generated a test file as follows:
I use
to benchmark. I compile with The current version has the following running times:
The proposed version has the following running times
The proposed version seems to be about 10% faster. I'll have look at the code tomorrow for subtle bugs. If I find none, I'll integrate the PR. (Some changes are necessary to make it work with GCC 4.8.4. I will therefore not directly merge it as is.) |
Interesting result. I was using the similar code (reduced line count to 50000000) with Visual Studio 2019. Old implementation
New implementation
Averaged out ~21% and ~26% improvement. Seems like VS was generating not as efficient code as GCC. |
Just realized that you were compiling for |
My policy with respect to old compilers is that if newer compilers provide a significant advantage, then bumping up the required compiler version can be considered. However, so far neither c++14 nor c++17 have provided anything where I'd say that it is worthwhile. I did some more benchmarking today. First of, all the running times in the low second range are with respect to a preheated file system cache. They are not severed from disk. When the data actually comes from the hard disk, then we are talking double digit seconds and all the optimizations done here do not matter as they are negligible. That being said, optimizing the case where data comes from the file system cache has its uses. The results of my micro benchmarks are really strange. I'm using the same test setup as before. I get the following running times for the current version:
This is consistent with yesterday. Now all I do is replace
with
The only difference is the order of the operands of the &&. The times that I see are:
The order of those two operands seems to have a significant impact. When I rerun the version from this PR, I get:
From what I gather from these numbers is that the best is to swap the order of the &&-operands and let all advanced bit-tricks be. |
The following version is also bad:
The running times that I observe are:
This does not make sense to me. |
Clang 9.0.0 produces running times with 2.1 sec regardless of the tested version. I think that the current version on master slightly wins out. However, the difference is so slow that it might be a random fluctuation. |
Nice catch. I think I came across this, but thought it was a fluke (didn't look further into it). Now I compared the two versions (check end first vs. check LF first) and the result is very interesting: size_t get_line_lf_end(char* buffer, int data_begin, int data_end) {
int line_end = data_begin;
while(buffer[line_end] != '\n' && line_end != data_end){
++line_end;
}
return line_end;
} Compiles to this:
While this: size_t get_line_end_lf(char* buffer, int data_begin, int data_end) {
int line_end = data_begin;
while(line_end != data_end && buffer[line_end] != '\n'){
++line_end;
}
return line_end;
} Compiles to this:
You can see that the second implementation generates a much tighter loop, where I also did a test run with the switched comparison on Windows and it showed ~10%-15% improvement over the original version. Which is quite good. |
One more thing regarding the bit-tricks commit. Can you do a testrun with the following testfile: {
std::ofstream out("testfile");
for (int i = 0; i < 10000000; ++i) {
out << i;
for (int j = 0; j < 10; ++j)
out << ',' << (1000000+j);
out << '\n';
}
} It's a scenario where each field contains 6 digit numbers instead of a single digit. The reason I ask is that I'm seeing the following results on my machine:
Bit-trickery
Which amounts to almost twice the parsing speed on Windows and I'm very curious what GCC/Clang makes of this. |
Measuring with a large numbers only CSV, this change showed ~25% improvement in reading speed.