Skip to content

[ML] Add cross compilation support, Docker images and CI for aarch64 #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions 3rd_party/3rd_party.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,25 @@ case `uname` in
STL_LOCATION=
ZLIB_LOCATION=
else
echo "Cannot cross compile to $CPP_CROSS_COMPILE"
exit 3
SYSROOT=/usr/local/sysroot-$CPP_CROSS_COMPILE-linux-gnu
BOOST_LOCATION=$SYSROOT/usr/local/gcc75/lib
BOOST_COMPILER=gcc
if [ "$CPP_CROSS_COMPILE" = aarch64 ] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to set SYSROOT, etc if this if condition fails. It seems more natural to just add this as a new
elif [ "$CPP_CROSS_COMPILE" = aarch64 ] ; then and keep the failure case at the end? Then if we added another target we could just add a new elif.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it like this is that during the porting exercise I realised that a port to Linux on any other architecture could reuse a lot of the same code. Everything between lines 78 and 96 would be the same except the Boost abbreviation for the hardware architecture.

(This is also why I named the new make rules file linux_cross_compile_linux.mk. I started out with linux_cross_compile_aarch64.mk but realised before I opened the PR that almost the entire file would be the same for a cross compile to Linux on any other hardware architecture. Maybe in 5 years time we'll be cross compiling x86_64 from aarch64.)

BOOST_ARCH=a64
else
echo "Cannot cross compile to $CPP_CROSS_COMPILE"
exit 3
fi
BOOST_EXTENSION=mt-${BOOST_ARCH}-1_71.so.1.71.0
BOOST_LIBRARIES='atomic chrono date_time filesystem iostreams log log_setup program_options regex system thread'
XML_LOCATION=$SYSROOT/usr/local/gcc75/lib
XML_EXTENSION=.so.2
GCC_RT_LOCATION=$SYSROOT/usr/local/gcc75/lib64
GCC_RT_EXTENSION=.so.1
STL_LOCATION=$SYSROOT/usr/local/gcc75/lib64
STL_PREFIX=libstdc++
STL_EXTENSION=.so.6
ZLIB_LOCATION=
fi
;;

Expand Down Expand Up @@ -183,7 +200,7 @@ fi
case `uname` in

Linux)
if [ -n "$INSTALL_DIR" -a -z "$CPP_CROSS_COMPILE" ] ; then
if [ -n "$INSTALL_DIR" -a "$CPP_CROSS_COMPILE" != macosx ] ; then
cd "$INSTALL_DIR"
for FILE in `find . -type f | egrep -v '^core|-debug$|libMl'`
do
Expand All @@ -192,13 +209,7 @@ case `uname` in
if [ $? -eq 0 ] ; then
echo "Set RPATH in $FILE"
else
# Set RPATH for 3rd party libraries that reference other libraries we ship
ldd $FILE | grep /usr/local/lib >/dev/null 2>&1 && patchelf --set-rpath '$ORIGIN/.' $FILE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this bit because it hasn't been doing anything since we switched from storing the 3rd party libraries we build from /usr/local/lib to /usr/local/gcc75/lib. This came up in this PR because ldd only works on the native architecture. But the line is completely unnecessary now as Boost must have changed the way they link related libraries since we first added this line.

if [ $? -eq 0 ] ; then
echo "Set RPATH in $FILE"
else
echo "Did not set RPATH in $FILE"
fi
echo "Did not set RPATH in $FILE"
fi
done
fi
Expand Down
7 changes: 1 addition & 6 deletions build-setup/linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,7 @@ export CPP_SRC_HOME=$HOME/ml-cpp
You need the C++ compiler and the headers for the `zlib` library that comes with the OS. You also need the archive utilities `unzip` and `bzip2`. Finally, the unit tests for date/time parsing require the `tzdata` package that contains the Linux timezone database. On RHEL/CentOS these can be installed using:

```
sudo yum install bzip2
sudo yum install gcc-c++
sudo yum install texinfo
sudo yum install tzdata
sudo yum install unzip
sudo yum install zlib-devel
sudo yum install bzip2 gcc-c++ texinfo tzdata unzip zlib-devel
```

On other Linux distributions the package names are generally the same and you just need to use the correct package manager to install these packages.
Expand Down
181 changes: 181 additions & 0 deletions build-setup/linux_aarch64_cross_compiled.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Machine Learning Build Machine Setup for Linux aarch64 cross compiled

You will need the following environment variables to be defined:

- `JAVA_HOME` - Should point to the JDK you want to use to run Gradle.
- `CPP_CROSS_COMPILE` - Should be set to "aarch64".
- `CPP_SRC_HOME` - Only required if building the C++ code directly using `make`, as Gradle sets it automatically.
- `PATH` - Must have `/usr/local/gcc75/bin` before `/usr/bin` and `/bin`.
- `LD_LIBRARY_PATH` - Must have `/usr/local/gcc75/lib64` and `/usr/local/gcc75/lib` before `/usr/lib` and `/lib`.

For example, you might create a .bashrc file in your home directory containing this:

```
umask 0002
export JAVA_HOME=/usr/local/jdk1.8.0_121
export LD_LIBRARY_PATH=/usr/local/gcc75/lib64:/usr/local/gcc75/lib:/usr/lib:/lib
export PATH=$JAVA_HOME/bin:/usr/local/gcc75/bin:/usr/bin:/bin:/usr/sbin:/sbin
# Only required if building the C++ code directly using make - adjust depending on the location of your Git clone
export CPP_SRC_HOME=$HOME/ml-cpp
export CPP_CROSS_COMPILE=aarch64
```

### Initial Preparation

Start by configuring a native Linux aarch64 build server as described in [linux.md](linux.md).

The remainder of these instructions assume the native aarch64 build server you have configured is for CentOS 7. This is what builds for distribution are currently built on.

On the fully configured native aarch64 build server, run the following commands:

```
cd /usr
tar jcvf ~/usr-aarch64-linux-gnu.tar.bz2 include lib lib64 local
```

These instructions assume the host platform is also CentOS 7, but x86_64 instead of aarch64. It makes life much easier if the host distribution is the same as the target distribution.

Transfer the archive created in your home directory on the native aarch64 build server, `usr-aarch64-linux-gnu.tar.bz2`, to your home directory on the x86_64 host build server.

### OS Packages

You need the C++ compiler and the headers for the `zlib` library that comes with the OS. You also need the archive utilities `unzip` and `bzip2`. On RHEL/CentOS these can be installed using:

```
sudo yum install bzip2 gcc-c++ texinfo unzip zlib-devel
```

### Transferred Build Dependencies

Add the dependencies that you copied from the fully configured native aarch64 build server in the "Initial Preparation" step.

```
sudo mkdir -p /usr/local/sysroot-aarch64-linux-gnu/usr
cd /usr/local/sysroot-aarch64-linux-gnu/usr
sudo tar jxvf ~/usr-aarch64-linux-gnu.tar.bz2
cd ..
sudo ln -s usr/lib lib
sudo ln -s usr/lib64 lib64
```

### General settings for building the tools

Most of the tools are built via a GNU "configure" script. There are some environment variables that affect the behaviour of this. Therefore, when building ANY tool on Linux, set the following environment variables:

```
export CFLAGS='-g -O3 -fstack-protector -D_FORTIFY_SOURCE=2'
export CXXFLAGS='-g -O3 -fstack-protector -D_FORTIFY_SOURCE=2'
export LDFLAGS='-Wl,-z,relro -Wl,-z,now'
export LDFLAGS_FOR_TARGET='-Wl,-z,relro -Wl,-z,now'
unset LIBRARY_PATH
```

These environment variables only need to be set when building tools on Linux. They should NOT be set when compiling the Machine Learning source code (as this should pick up all settings from our Makefiles).

### binutils (bootstrap version)

Since we build with a more recent gcc than comes with the host system, we must build it from source. To build a cross compiler we need cross build tools, so we need to build versions that are compatible with the system compiler that we'll use to build the more recent gcc.

Download `binutils-2.25.tar.bz2` from <http://ftpmirror.gnu.org/binutils/binutils-2.25.tar.bz2>.

Uncompress and untar the resulting file. Then run:

```
unset LD_LIBRARY_PATH
export PATH=/usr/bin:/bin:/usr/sbin:/sbin
./configure --with-sysroot=/usr/local/sysroot-aarch64-linux-gnu --target=aarch64-linux-gnu --with-system-zlib --disable-multilib --disable-libstdcxx
```

This should build an appropriate Makefile. Assuming it does, type:

```
make
sudo make install
```

to install.

### gcc

We have to build on old Linux versions to enable our software to run on the older versions of Linux that users have. However, this means the default compiler on our Linux build servers is also very old. To enable use of more modern C++ features, we use the default compiler to build a newer version of gcc and then use that to build all our other dependencies.

Download `gcc-7.5.0.tar.gz` from <http://ftpmirror.gnu.org/gcc/gcc-7.5.0/gcc-7.5.0.tar.gz>.

Unlike most automake-based tools, gcc must be built in a directory adjacent to the directory containing its source code, so build and install it like this:

```
tar zxvf gcc-7.5.0.tar.gz
cd gcc-7.5.0
contrib/download_prerequisites
sed -i -e 's/$(SHLIB_LDFLAGS)/-Wl,-z,relro -Wl,-z,now $(SHLIB_LDFLAGS)/' libgcc/config/t-slibgcc
cd ..
mkdir gcc-7.5.0-build
cd gcc-7.5.0-build
unset LD_LIBRARY_PATH
export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
../gcc-7.5.0/configure --prefix=/usr/local/gcc75 --with-sysroot=/usr/local/sysroot-aarch64-linux-gnu --target=aarch64-linux-gnu --enable-languages=c,c++ --enable-vtable-verify --with-system-zlib --disable-multilib
make -j 6
sudo env PATH="$PATH" make install
```

(Note the `env PATH="$PATH"` bit in the install command - this is because the cross tools we put in `/usr/local/bin` are needed during the install.)

To confirm that everything works correctly run:

```
aarch64-linux-gnu-g++ --version
```

It should print:

```
aarch64-linux-gnu-g++ (GCC) 7.5.0
```

in the first line of the output. If it doesn't then double check that `/usr/local/gcc75/bin` is near the beginning of your `PATH`.

### binutils (final version)

Also due to building on old Linux versions yet wanting to use modern libraries we have to install an up-to-date version of binutils. This will be used in preference to the bootstrap version by ensuring that `/usr/local/gcc75/bin` is at the beginning of `PATH`.

Download `binutils-2.34.tar.bz2` from <http://ftpmirror.gnu.org/binutils/binutils-2.34.tar.bz2>.

Uncompress and untar the resulting file. Then run:

```
./configure --prefix=/usr/local/gcc75 --with-sysroot=/usr/local/sysroot-aarch64-linux-gnu --target=aarch64-linux-gnu --enable-vtable-verify --with-system-zlib --disable-multilib --disable-libstdcxx --with-gcc-major-version-only
```

This should build an appropriate Makefile. Assuming it does, type:

```
make
sudo make install
```

to install.

### patchelf

Obtain patchelf from <http://nixos.org/releases/patchelf/patchelf-0.9/> - the download file will be `patchelf-0.9.tar.bz2`.

Extract it to a temporary directory using:

```
bzip2 -cd patchelf-0.9.tar.bz2 | tar xvf -
```

In the resulting `patchelf-0.9` directory, run the:

```
./configure --prefix=/usr/local/gcc75
```

script. This should build an appropriate Makefile. Assuming it does, run:

```
make
sudo make install
```

to complete the build.
15 changes: 12 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,20 @@ if (cppCrossCompile == null) {
cppCrossCompile = ''
}
}
if (cppCrossCompile != '' && cppCrossCompile != 'macosx') {
throw new GradleException("CPP_CROSS_COMPILE property must be empty or 'macosx'")
if (cppCrossCompile != '' && cppCrossCompile != 'macosx' && cppCrossCompile != 'aarch64') {
throw new GradleException("CPP_CROSS_COMPILE property must be empty, 'macosx' or 'aarch64'")
}

String artifactClassifier = (isWindows ? "windows-x86_64" : ((isMacOsX || cppCrossCompile == 'macosx') ? "darwin-x86_64" : "linux-x86_64"))
String artifactClassifier;
if (isWindows) {
artifactClassifier = 'windows-x86_64'
} else if (isMacOsX || cppCrossCompile == 'macosx') {
artifactClassifier = 'darwin-x86_64'
} else if (cppCrossCompile != '') {
artifactClassifier = 'linux-' + cppCrossCompile
} else {
artifactClassifier = 'linux-' + System.properties['os.arch']
}

// Always do the C++ build using bash (Git bash on Windows)
project.ext.bash = isWindows ? "C:\\Program Files\\Git\\bin\\bash" : "/bin/bash"
Expand Down
32 changes: 32 additions & 0 deletions dev-tools/docker/build_linux_aarch64_cross_build_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License;
# you may not use this file except in compliance with the Elastic License.
#

# Builds the Docker image that can be used to compile the machine learning
# C++ code for Linux.
#
# This script is not intended to be run regularly. When changing the tools
# or 3rd party components required to build the machine learning C++ code
# increment the version, change the Dockerfile and build a new image to be
# used for subsequent builds on this branch. Then update the version to be
# used for builds in docker/linux_builder/Dockerfile.

HOST=push.docker.elastic.co
ACCOUNT=ml-dev
REPOSITORY=ml-linux-aarch64-cross-build
VERSION=1

set -e

cd `dirname $0`

docker build --no-cache -t $HOST/$ACCOUNT/$REPOSITORY:$VERSION linux_aarch64_cross_image
# Get a username and password for this by visiting
# https://docker.elastic.co:7000 and allowing it to authenticate against your
# GitHub account
docker login $HOST
docker push $HOST/$ACCOUNT/$REPOSITORY:$VERSION

38 changes: 38 additions & 0 deletions dev-tools/docker/build_linux_aarch64_native_build_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License;
# you may not use this file except in compliance with the Elastic License.
#

# Builds the Docker image that can be used to compile the machine learning
# C++ code for Linux.
#
# This script is not intended to be run regularly. When changing the tools
# or 3rd party components required to build the machine learning C++ code
# increment the version, change the Dockerfile and build a new image to be
# used for subsequent builds on this branch. Then update the version to be
# used for builds in docker/linux_builder/Dockerfile.

if [ `uname -m` != aarch64 ] ; then
echo "Native build images must be built on the correct hardware architecture"
echo "Required: aarch64, Current:" `uname -m`
exit 1
fi

HOST=push.docker.elastic.co
ACCOUNT=ml-dev
REPOSITORY=ml-linux-aarch64-native-build
VERSION=1

set -e

cd `dirname $0`

docker build --no-cache -t $HOST/$ACCOUNT/$REPOSITORY:$VERSION linux_aarch64_native_image
# Get a username and password for this by visiting
# https://docker.elastic.co:7000 and allowing it to authenticate against your
# GitHub account
docker login $HOST
docker push $HOST/$ACCOUNT/$REPOSITORY:$VERSION

6 changes: 6 additions & 0 deletions dev-tools/docker/build_linux_build_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
# used for subsequent builds on this branch. Then update the version to be
# used for builds in docker/linux_builder/Dockerfile.

if [ `uname -m` != x86_64 ] ; then
echo "Native build images must be built on the correct hardware architecture"
echo "Required: x86_64, Current:" `uname -m`
exit 1
fi

HOST=push.docker.elastic.co
ACCOUNT=ml-dev
REPOSITORY=ml-linux-build
Expand Down
27 changes: 27 additions & 0 deletions dev-tools/docker/linux_aarch64_cross_builder/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License;
# you may not use this file except in compliance with the Elastic License.
#

# Increment the version here when a new tools/3rd party components image is built
FROM docker.elastic.co/ml-dev/ml-linux-aarch64-cross-build:1

MAINTAINER David Roberts <[email protected]>

# Copy the current Git repository into the container
COPY . /ml-cpp/

# Tell the build we want to cross compile
ENV CPP_CROSS_COMPILE aarch64

# Pass through any version qualifier (default none)
ARG VERSION_QUALIFIER=

# Pass through whether this is a snapshot build (default yes if not specified)
ARG SNAPSHOT=yes

# Run the build
RUN \
/ml-cpp/dev-tools/docker/docker_entrypoint.sh

Loading