Ansible playbooks supporting the deployment of YDB clusters into VMs or baremetal servers.
Currently, the playbooks support the following scenarios:
- the initial deployment of YDB static (storage) nodes;
- YDB database creation;
- the initial deployment of YDB dynamic (database) nodes;
- adding extra YDB dynamic nodes to the YDB cluster;
- updating cluster configuration file and TLS certificates, with automatic rolling restart.
The following scenarios are yet to be implemented (TODO):
- configuring extra storage devices within the existing YDB static nodes;
- adding extra YDB static nodes to the existing cluster;
- removing YDB dynamic nodes from the existing cluster.
Current limitations:
- supported python interpreter version on managed servers must be >= 3.7
- configuration file customization depends on the support of automatic actor system threads management, which requires YDB version 23.1.26.hotfix1 or later;
- the cluster configuration file has to be manually created;
- there are no examples for configuring the storage nodes with different disk layouts (it seems to be doable by defining different
ydb_disks
values for different host groups).
- Ubuntu 20.04, 22.04, 24.04
- Debian 11.11, 12.7
- AstraLinux 1.7, 1.8
- AlmaLinux 8.9, 9.4, 9.5
- Altlinux 8.4, 10, 10.1, 10.2
- RedHat 9.3
- RedOS 7.3, 8
- CentOS 8
- SberLinux 9.0* (Special requirements for isolated install)
Documentation for the collection.
Default configuration settings are defined in the group_vars/all
file as a set of Ansible variables. An example file is provided. Different playbook executions may require different variable values, which can be accomplished by specifying extra JSON-format files and passing those files in the command line.
The meaning and format of the variables used are specified in the table below.
Variable | Meaning |
---|---|
ydb_libidn_archive |
Enable the installation of custom-built libidn for RHEL, AlmaLinux or Rocky Linux. |
ydb_libidn_archive_unpack_options |
Extra flags to be passed to tar for unpacking custom-built libidn package. Default value: ['--strip-component=1'] |
ydb_archive |
YDB server binary package in .tar.gz format |
ydb_archive_unpack_options |
Extra flags to be passed to tar for unpacking the YDB server binaries. Default value: ['--strip-component=1'] |
ydb_config |
The name of the cluster configuration file within the files subdirectory (without the actor_system_config snippet!) |
ydb_tls_dir |
Path to the local directory with the TLS certificates and keys, as generated by the sample script, or following the filename convention used by the sample script |
ydb_domain |
The name of the root domain hosting the databases, value Root is used in the YDB documentation |
ydb_disks |
Disk layout of storage nodes, defined as ydbd_static in the hosts file. Defined as list of structures having the following fields:name - physical device name (like /dev/sdb or /dev/vdb );label - the desired YDB data partition label, as used in the cluster configuration file (like ydb_disk_1 ) |
ydb_dynnodes |
Set of dynamic nodes to be ran on each host listed as ydbd_dynamic in the hosts file. Defined as list of structures having the following fields:dbname - name of the YDB database handled by the corresponding dynamic node;instance - dynamic node service instance name, allowing to distinguish between multiple dynamic nodes for the same database running in the same host;offset - integer number 0-N , used as the offset for the standard network port numbers (0 means using the standard ports). |
ydb_brokers |
List of host names running the YDB static nodes, exactly 3 (three) host names must be specified |
ydb_cores_static |
Number of cores to be used by thread pools of the static nodes |
ydb_cores_dynamic |
Number of cores to be used by thread pools of the dynamic nodes |
ydb_dbname |
Database name, for database creation, dynamic nodes deployment and dynamic nodes rolling restart |
ydb_pool_kind |
YDB default storage pool kind, as specified in the static nodes configuration file in the storage_pool_types.kind field |
ydb_database_groups |
Initial number of storage groups in the newly created database |
ydb_dynnode_restart_sleep_seconds |
Number of seconds to sleep after startup of each dynamic node during the rolling restart. |
Overall installation is performed according to the official instruction, with several steps automated with Ansible. The steps below are adopted for the Ansible-based process:
-
Review the system requirements, and prepare the YDB hosts. Ensure that SSH access and sudo-based root privileges are available.
-
Prepare the TLS certificates, the provided sample script may be used for automation of this step.
-
Download the YDB server distribution. It is better to use the latest binary version available.
-
Ensure that you have Python 3.8 or later installed on all hosts of the cluster.
-
Configure the passwordless SSH access to all hosts of the cluster.
-
Configure the priviledge escalation on all hosts of the cluster, such as passwordless sudo for the user account with the SSH access.
-
Install
ansible-core
version 2.11-2.15. Ansible 2.10 or older is not supported. -
Install the required YDB Ansible collections from Github:
ansible-galaxy collection install git+https://github.com/ydb-platform/ydb-ansible.git
Alternatively, download the current releases of Ansible collection for YDB. In addition, Prometheus and Grafana collections can be used to automatically deploy the monitoring services (optionally). Install the collections from the archives:
ansible-galaxy collection install prometheus-prometheus-X.Y.Z.tar.gz ansible-galaxy collection install grafana-ansible-collection-X.Y.Z.tar.gz ansible-galaxy collection install ydb-ansible-X.Y.tar.gz
-
In the new subdirectory, create the
ansible.cfg
file using the provided example. -
Create the
files
andfiles/certs
directories, and put the TLS keys and certificates there. If the certificates were generated using the provided helper script, theCA/certs/YYYY-MM-DD_hh-mm-ss
subdirectory should typically be copied asfiles/certs
. -
Create the
inventory/50-inventory.yaml
,inventory/99-inventory-vault.yaml
files. These files contain the host list, installation configuration and secrets to be used. The example files are provided: inventory.yaml, inventory-vault.yaml. -
Create the Ansible Vault password file as
ansible_vault_password_file
, with the password to protect the sensible secrets. -
Encrypt
inventory/99-inventory-vault.yaml
withansible-vault encrypt inventory/99-inventory-vault.yaml
command. To edit this file use commandansible-vault edit inventory/99-inventory-vault.yaml
. -
Prepare the cluster configuration file according to the instructions in the documentation, and save as
files/config.yaml
. Omit theactor_system_config
section - it will be added automatically. -
Create the setup playbook based on the provided example. Customize the required actions as needed.
-
Deploy the YDB cluster by running the playbook with the following command:
ansible-playbook ydb_platform.ydb.initial_setup
To update the YDB cluster configuration files (ydbd-config.yaml
, TLS certificates and keys) using the Ansible playbook, the following actions are necessary:
- Ensure that the
hosts
file contains the current list of YDB cluster nodes, both static and dynamic. - Ensure that the configuration variable
ydbd_config
in thegroup_vars/all
file points to the desired YDB server configuration file. - Ensure that the configuration variable
ydbd_tls_dir
points to the directory containing the desired TLS key and certificate files for all the nodes within the YDB cluster. - Apply the updated configuration to the cluster by running the
run-update-config.sh
script. Ensure that the playbook has been completed successfully, and diagnose and fix execution errors if they happen.
Notes:
- Please take into account that rolling restart is performed node by node, and for a large cluster the process may consume a significant amount of time.
- For Certificate Authority (CA) certificate rotation, at least two separate configuration updates are needed:
- first to deploy the ca.crt file, containing both new and old CA certificates;
- second to deploy the fresh server keys and certificates signed by the new CA certificate.
- libaio or libaio1 is installed, depending on the operating system
- chrony is installed and enabled to ensure time synchronization
- jq is installed to support some scripting logic used in the playbooks
- YDB user group and user is created
- YDB installation directory is created
- YDB server software binary package is unpacked into the YDB installation directory
- YDB client package automatic update checks are disabled for the YDB user, to avoid extra messages from client commands.
- YDB TLS certificates and keys are copied to each server
- YDB cluster configuration file is copied to each server
- Transparent huge pages (THP) are enabled on each server, which is implemented by the creation, activation and start of the corresponding systemd service.
- Installation actions are executed.
- For each disk configured, it is checked for the existing YDB data. If none found, disk is completely re-partitioned, and obliterated. For the existing YDB data, no changes are made.
WARNING: the safety checks do not work for YDB disks using non-default encryption keys. DATA LOSS IS POSSIBLE if the encryption is actually used. Probably an enhancement is needed to support the encryption key to be specified in the deployment option.
ydbd-storage.service
is created and configured as the systemd service.ydbd-storage.service
is started, and the playbook waits for static nodes to come up.- YDB blobstorage configuration is applied with the
ydbd admin blobstorage init
command. - The playbook waits for the completion of YDB storage initialization.
- The initial password for the
root
user is configured according to contents of thefiles/secret
file.
- Installation actions are executed.
- For each database configured, the list of YDB dynnode systemd services are created and configured.
- YDB dynnode services are started.
- YDB TLS certificates and keys are copied to each server.
- YDB cluster configuration file is copied to each server.
- Rolling restart is performed for YDB storage nodes, node by node, checking for the YDB storage cluster to become healthy after the restart of each node.
- Rolling restart is performed for YDB database nodes, server by server, restarting all nodes sitting in the single server at a time, and waiting for the specified number of seconds after each server's nodes restart.
- ydb_platform.ydb.initial_setup - Install cluster from scratch
- ydb_platform.ydb.binaries_all - Install YDB binaries to all nodes
- ydb_platform.ydb.binaries_static - Install YDB binaries to all static/storage nodes
- ydb_platform.ydb.binaries_dynamic - Install YDB binaries to all dynamic nodes
- ydb_platform.ydb.restart - Restart static (weak mode) and after that dynamic nodes
- ydb_platform.ydb.rolling_restart_static - Restart static nodes in weak mode
- ydb_platform.ydb.rolling_restart_dynamic - Restart dynamic nodes
graph TD;
binaries_all --all hosts--> install_ydb
binaries_static --static--> install_ydb
binaries_dynamic --dynamic--> install_ydb
restart --1--> rolling_restart_static
restart --2--> rolling_restart_dynamic
subgraph install_ydb
install_from_archive(use local archive with binaries)
install_from_source_code(make it from source code)
install_from_version(download from official site)
install_from_binary(use local binaries)
end
Isolated mode - situation when hosts are isolated from Internet (intranet, secure environment). There two possible ways to install:
- Use bastion / jump host
- Use internal preconfigured host
For SberLinux9.0 package libxcrypt-compat is required. It can be placed as files/libxcrypt-compat.rpm
or you can define your own URL to download it via ansible variable package_libxcrypt_url
. Example
`ansible-playbook ydb_platform.ydb.initial_setup --extra-vars "package_libxcrypt_url=https://localrepo/AppStream/x86_64/os/Packages/libxcrypt-compat-4.4.18-3.el9.x86_64.rpm"`
The procedure of install is just the same like common install. But there're some limitations and recomendations.
- Required settings in inventory (50-inventory.yaml)
ansible_user: bastion_username
ansible_ssh_common_args: "-o StrictHostKeyChecking=no -o User=node_username -A -J bastion_username@{{ lookup('env','JUMP_IP') }}"
# This key must work with all nodes (bastion and YDB hosts)
# Or you must specify for hosts specific private key in ansible_ssh_common_args
ansible_ssh_private_key_file: "~/.ssh/id_rsa"
- YDB Dstool must be installed from binary (50-inventory.yaml)
ydb_dstool_binary: "{{ ansible_config_file | dirname }}/files/ydb-dstool"
graph LR
subgraph host
ansible
ydbops
end
subgraph Internal Network
jump
node01
node02
node03
end
host --22/tcp--> jump
jump --22/tcp--> node01
jump --22/tcp--> node02
jump --22/tcp--> node03
WARNING: Cluster restart doesn't work with bastion without direct access to FQDN nodes via 2135/tcp.
- Prepare binaries:
- create docker image for Ansible by using Dockerfile (Internet connection is required, for example:
docker build . -t ydb-ansible
) and save it as a binary file (docker save ydb-ansible -o ydb-ansible.image
) - download YDB archive or build binaries from sources
- download YDB Dstool (or create it from sources)
- download YDBOps (https://github.com/ydb-platform/ydbops/releases)
- create docker image for Ansible by using Dockerfile (Internet connection is required, for example:
- Prepare host for ansible:
- install Docker (For example, apt install docker.io)
- install YDB docker image (
docker load -i ydb-ansible.image
) - upload binary files (ydb, ydb-dstool, ydbd, ydbops)
- Prepare ansible configuration as it is described above (TLS certificates, config, inventory). Required settings in inventory (50-inventory.yaml)
ydb_tls_dir: "{{ ansible_config_file | dirname }}/TLS/CA/certs/2024-11-21_09-07-03"
ydbd_binary: "{{ ansible_config_file | dirname }}/files/ydbd"
ydb_cli_binary: "{{ ansible_config_file | dirname }}/files/ydb"
ydb_version: "24.4.1"
ydbops_binary: "{{ ansible_config_file | dirname }}/files/ydbops"
ydb_dstool_binary: "{{ ansible_config_file | dirname }}/files/ydb-dstool"
HINT: All files paths must be in ansible folder. This folder will be mounted as /ansible
in docker container.
- Execute playbook from the ansible folder with configured files:
sudo docker run -it --rm \
-v $(pwd):/ansible \
-v /home/ansible/.ssh:/root/.ssh \
ydb-ansible ansible-playbook ydb_platform.ydb.initial_setup
sudo docker run -it --rm \
-v $(pwd):/ansible \
-v /home/ansible/.ssh:/root/.ssh \
ydb-ansible ansible-console ydb
You can download another version of YDB Ansible collection or get official archive and change it in your own way.
git clone https://github.com/ydb-platform/ydb-ansible /home/ansible/ydb-ansible
cd /home/ansible/ydb-ansible
git checkout SOMEBRANCH
sudo docker run -it --rm \
-v $(pwd):/ansible \
-v /home/ansible/.ssh:/root/.ssh \
-v /home/ansible/ydb-ansible:/root/.ansible/collections/ansible_collections/ydb_platform/ydb \
ydb-ansible ansible-console ydb
It's possible to use separated networks for YDB cluster:
- front-end network - for communication between YDB clients and YDB cluster
- back-end network - for inter-communications between YDB cluster nodes
graph LR
subgraph client
end
subgraph node01
node01-front-fqdn
node01-back-fqdn
end
subgraph node02
node02-front-fqdn
node02-back-fqdn
end
subgraph node03
node03-front-fqdn
node03-back-fqdn
end
node01-back-fqdn <--> node02-back-fqdn
node01-back-fqdn <--> node03-back-fqdn
node03-back-fqdn <--> node02-back-fqdn
client --> node01-front-fqdn
client --> node02-front-fqdn
client --> node03-front-fqdn
First of all, back-end network is main network for the cluster. That's why back-end FQDN must be configured as hostnames for the nodes.
Fron-end FQDN must be defined as host-variable ydb_front
. Also it's possible to define NodeId
via ydb_back_number
variable.
List of brokers is important part for dynamic nodes and it must contain back-end FQDN.
Example. Inventory part for nodes
all:
children:
ydb:
hosts:
ydb-node01.back.ru-central1.internal:
ydb_front: ydb-node01.front.ru-central1.internal
ydb_back_number: 1
ydb-node02.back.ru-central1.internal:
ydb_front: ydb-node02.front.ru-central1.internal
ydb_back_number: 2
ydb-node03.back.ru-central1.internal:
ydb_front: ydb-node03.front.ru-central1.internal
ydb_back_number: 3
Example. Inventory part for brokers
ydb_brokers:
- ydb-node01.back.ru-central1.internal
- ydb-node02.back.ru-central1.internal
- ydb-node03.back.ru-central1.internal
In config only back-end FQDN are used
hosts:
- host: ydb-node01.back.ru-central1.internal
host_config_id: 1
location:
unit: srv001
data_center: YDB1
rack: RACK1
- host: ydb-node02.back.ru-central1.internal
host_config_id: 1
location:
unit: srv002
data_center: YDB1
rack: RACK1
...
It's required to generate certificated for FQDN in both networks if GRPCS is used.
Example. ydb-ca-nodes.txt
for generating certificates
ydb-node01 ydb-node01.front.ru-central1.internal ydb-node01.back.ru-central1.internal
ydb-node02 ydb-node02.front.ru-central1.internal ydb-node02.back.ru-central1.internal
ydb-node03 ydb-node03.front.ru-central1.internal ydb-node03.back.ru-central1.internal
There are two possible ways to add new nodes into the cluster:
- Simple - use
initial_setup
playbook - Long - use several playbooks
- Update config.yaml - add new nodes into
hosts
section.
Example. Before changes
hosts:
- host: ydb-node01.ru-central1.internal
host_config_id: 1
location:
unit: srv001
data_center: YDB1
rack: RACK1
- host: ydb-node02.ru-central1.internal
host_config_id: 1
location:
unit: srv002
data_center: YDB1
rack: RACK1
...
Example. After changes
hosts:
- host: ydb-node01.ru-central1.internal
host_config_id: 1
location:
unit: srv001
data_center: YDB1
rack: RACK1
- host: ydb-node02.ru-central1.internal
host_config_id: 1
location:
unit: srv002
data_center: YDB1
rack: RACK1
...
- host: ydb-node-NEW.ru-central1.internal
host_config_id: 100
location:
unit: srv100
data_center: YDB3
rack: RACK10
- Generate SSL certificates for new nodes
- Update configs on the current nodes and restart cluster
ansible-playbook ydb_platform.ydb.update_config
- Add new nodes into inventory
all:
children:
ydb:
hosts:
ydb-node01.ru-central1.internal:
ydb-node02.ru-central1.internal:
ydb-node03.ru-central1.internal:
ydb-node-NEW.ru-central1.internal:
- Install YDB on new nodes and start them
ansible-playbook ydb_platform.ydb.initial_setup -l ydb-node-NEW.ru-central1.internal --skip-tags password,create_database
- Check the cluster
- Update config.yaml - add new nodes into
hosts
section.
Example. Before changes
hosts:
- host: ydb-node01.ru-central1.internal
host_config_id: 1
location:
unit: srv001
data_center: YDB1
rack: RACK1
- host: ydb-node02.ru-central1.internal
host_config_id: 1
location:
unit: srv002
data_center: YDB1
rack: RACK1
...
Example. After changes
hosts:
- host: ydb-node01.ru-central1.internal
host_config_id: 1
location:
unit: srv001
data_center: YDB1
rack: RACK1
- host: ydb-node02.ru-central1.internal
host_config_id: 1
location:
unit: srv002
data_center: YDB1
rack: RACK1
...
- host: ydb-node-NEW.ru-central1.internal
host_config_id: 100
location:
unit: srv100
data_center: YDB3
rack: RACK10
- Generate SSL certificates for new nodes
- Update configs on the current nodes and restart cluster
ansible-playbook ydb_platform.ydb.update_config
- Add new nodes into inventory
all:
children:
ydb:
hosts:
ydb-node01.ru-central1.internal:
ydb-node02.ru-central1.internal:
ydb-node03.ru-central1.internal:
ydb-node-NEW.ru-central1.internal:
- Prepare nodes for YDB:
ydb_platform.ydb.prepare_host -l ydb-node-NEW.ru-central1.internal
- Install YDB on new static nodes and start them
ydb_platform.ydb.install_static -l ydb-node-NEW.ru-central1.internal --skip-tags password,create_database
- Install YDB on new dynamic nodes and start them
ydb_platform.ydb.install_dynamic -l ydb-node-NEW.ru-central1.internal --skip-tags password,create_database
- Check the cluster
-
Q: How to install on Linux with kernel 5.15.0-1073-kvm, which does not contain the tcp_htcp module? A1: define empty variable
ydb_congestion_module
in inventory A2: define variable in command line:ansible-playbook ydb_platform.ydb.initial_setup --extra-vars "ydb_congestion_module="
-
Q: How to handle error:
aborting playbook execution. Stop running YDB instances
? A1: Manually stop YDB instances on the hosts for new YDB installation A2: Use other hosts without YDB A3: Use ansible-console to stop YDB instances:ansible-console ydb $ sudo systemctl stop ydbd-storage $ sudo systemctl stop ydbd-database-a $ sudo systemctl stop ydbd-database-b