[ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1142

tveasey · 2020-04-15T17:04:14Z

Eigen uses explicit intrinsic instructions for various operations. Currently, we enable SSE 4.2 (although should perhaps consider AVX as well longer term). These benefit considerably from using 16 byte alignment for the memory backing the vector/matrix. Eigen ensures this for memory it allocates itself, but in a number of performance critical pieces of code we use mapped matrices and vectors to cutdown on allocations (see here for documentation of this type). It is possible to supply the alignment to the map class if this is known, but before this change we were supplying unaligned here. So, loads/stores were not optimised.

This change extends to support alignments up to 32 bytes and chooses 16 byte alignment by default. There are two areas where I now have to manage alignment as a result: in the data frame and in the accumulators of the loss function gradients. The high-level strategy is as follows:

Use the Eigen::aligned_allocator to ensure that the start of vector storage are aligned to 32 bytes. (Note that it is cleaner to manage the alignment of Eigen allocations globally and I've selected 32 bytes because this is more future proof.)
Insert pads into the vectors so that we maintain the alignment of addresses to the start of the rows and the start of the accumulated loss derivatives.

The pads are calculated by rounding up the capacity of the row to a multiple of the alignment and by rounding up the start position of derivatives as necessary.

I've tested this across of a range of benchmark sets on my i9 laptop. The speed up is fairly stable and around a 15% mean improvement for a range of regression and binary classification tasks. This does not change the results.

… derivatives (we always merge before remapping, but this way is safe)

droberts195

LGTM except for a couple of minor things

The speed up is fairly stable and around a 15% mean improvement for a range of regression and binary classification tasks.

It will be interesting to see whether the same magnitude of speedup is also observed on Linux and Windows. The default malloc on 64 bit Linux and Windows already returns 16 byte aligned memory, so these changes will only help for padding within large arrays, not for the start of the arrays. I don't have a feel for what proportion of the changed alignments are due to extra padding within arrays as opposed to the start address.

lib/core/CDataFrame.cc

mk/linux.mk

tveasey · 2020-04-16T12:03:56Z

The default malloc on 64 bit Linux and Windows already returns 16 byte aligned memory, so these changes will only help for padding within large arrays, not for the start of the arrays.

Agreed and I haven't tested this yet. However, it is a tiny proportion, but more importantly, the key is to tell Eigen that the memory is aligned, via the new alignment template parameter. If you specify unaligned, the default, it won't exploit the fact the memory is aligned.

…a frame analyses (elastic#1142)

…r data frame analyses (#1152) Backport #1142.

tveasey added 12 commits April 14, 2020 13:07

Alignment WIP

24c7982

Support nested aligned columns

bb71e24

Merge branch 'master' into alignment

55329dc

Fixes

0bf3142

More fixes

f9ca27b

Fixes plus use aligned vectors for outlier detection

eb791ec

Bug

d1a3228

Merge branch 'master' into alignment

af315fe

Bug fix and avoid mysterious copy of function into vector

b3c4813

Tidy

e2959a0

Make sure we merge all the state when merging two sets of accumulated…

7f5c45d

… derivatives (we always merge before remapping, but this way is safe)

Fix api

7e81f83

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.8.0 labels Apr 15, 2020

tveasey added 5 commits April 15, 2020 18:08

Docs

40f2baa

Forgotten files

f601ab4

Formatting

d61b453

Windows build

a901d98

Tidy up variable naming

e0038ac

droberts195 approved these changes Apr 16, 2020

View reviewed changes

lib/core/CDataFrame.cc Show resolved Hide resolved

mk/linux.mk Show resolved Hide resolved

Spurious include

2a96e26

tveasey added 5 commits April 16, 2020 13:13

Eigen alignment for aarch64

c405438

Test thresholds

27eb230

Fix linux build

3338910

Type unused

f40d2f3

Another fix

525f0ec

tveasey added 3 commits April 16, 2020 14:49

doh

e39c224

Explain optimisation

06f7e07

Test fix

6890fa7

tveasey merged commit dd83943 into elastic:master Apr 16, 2020

tveasey deleted the alignment branch April 16, 2020 18:00

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Apr 17, 2020

[ML] Ensure the performance critical data are 16 byte aligned for dat…

e0b57bb

…a frame analyses (elastic#1142)

tveasey mentioned this pull request Apr 17, 2020

[7.8][ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1152

Merged

tveasey added a commit that referenced this pull request Apr 17, 2020

[7.8][ML] Ensure the performance critical data are 16 byte aligned fo…

80a8e99

…r data frame analyses (#1152) Backport #1142.

tveasey mentioned this pull request Apr 20, 2020

[ML] Fix memory usage estimation for vectors with a custom allocator #1156

Merged

tveasey mentioned this pull request Jul 3, 2020

[ML] Fix cause of SIGSEGV for classification and regression #1379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1142

[ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1142

Uh oh!

tveasey commented Apr 15, 2020

Uh oh!

droberts195 left a comment

Uh oh!

Uh oh!

Uh oh!

tveasey commented Apr 16, 2020 •

edited

Loading

Uh oh!

Uh oh!

[ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1142

[ML] Ensure the performance critical data are 16 byte aligned for data frame analyses #1142

Uh oh!

Conversation

tveasey commented Apr 15, 2020

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tveasey commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tveasey commented Apr 16, 2020 •

edited

Loading