diff --git a/ydb/docs/en/core/_includes/fault-tolerance.md b/ydb/docs/en/core/_includes/fault-tolerance.md deleted file mode 100644 index ca629fa3cd45..000000000000 --- a/ydb/docs/en/core/_includes/fault-tolerance.md +++ /dev/null @@ -1,5 +0,0 @@ -{% note info %} - -YDB cluster is fault tolerant. Temporarily shutting down a node doesn't affect the cluster availability. For details, see [{#T}](../cluster/topology.md). - -{% endnote %} diff --git a/ydb/docs/en/core/administration/production-storage-config.md b/ydb/docs/en/core/administration/production-storage-config.md deleted file mode 100644 index 6e2a89f39ecc..000000000000 --- a/ydb/docs/en/core/administration/production-storage-config.md +++ /dev/null @@ -1,78 +0,0 @@ -# BlobStorage production configurations - -To ensure the required fault tolerance of {{ ydb-short-name }}, configure the [cluster disk subsystem](../concepts/cluster/distributed_storage.md) properly: select the appropriate [fault tolerance mode](#fault-tolerance) and [hardware configuration](#requirements) for your cluster. - -## Fault tolerance modes {#fault-tolerance} - -We recommend using the following [fault tolerance modes](../cluster/topology.md) for {{ ydb-short-name }} production installations: - -* `block-4-2`: For a cluster hosted in a single availability zone. -* `mirror-3-dc`: For a cluster hosted in three availability zones. - -A fail model of {{ ydb-short-name }} is based on concepts such as fail domain and fail realm. - -Fail domain - -: A set of hardware that may fail concurrently. - - For example, a fail domain includes disks of the same server (as all server disks may be unavailable if the server PSU or network controller is down). A fail domain also includes servers located in the same server rack (as the entire hardware in the rack may be unavailable if there is a power outage or some issue with the network hardware in the same rack). - - Any domain fail is handled automatically, without shutting down the system. - -Fail realm - -: A set of fail domains that may fail concurrently. - - An example of a fail realm is hardware located in the same data center that may fail as a result of a natural disaster. - -Usually a fail domain is a server rack, while a fail realm is a data center. - -When creating a [storage group](../concepts/databases.md#storage-groups), {{ ydb-short-name }} groups VDisks that are located on PDisks from different fail domains. For `block-4-2` mode, a PDisk should be distributed across at least 8 fail domains, and for `mirror-3-dc` mode, across 3 fail realms with at least 3 fail domains in each of them. - -## Hardware configuration {#requirements} - -If a disk fails, {{ ydb-short-name }} may automatically reconfigure a storage group so that, instead of the VDisk located on the failed hardware, a new VDisk is used that the system tries to place on hardware that is running while the group is being reconfigured. In this case, the same rule applies as when creating a group: a VDisk is created in a fail domain that is different from the fail domains of any other VDisk in this group (and in the same fail realm as that of the failed VDisk for `mirror-3-dc`). - -This causes some issues when a cluster's hardware is distributed across the minimum required amount of fail domains: - -* If the entire fail domain is down, reconfiguration no longer makes sense, since a new VDisk can only be located in the fail domain that is down. -* If part of a fail domain is down, reconfiguration is possible, but the load that was previously handled by the failed hardware will only be redistributed across hardware in the same fail domain. - -If the number of fail domains in a cluster exceeds the minimum amount required for creating storage groups at least by one (that is, 9 domains for `block-4-2` and 4 domains in each fail realm for `mirror-3-dc)`, in case some hardware fails, the load can be redistributed across all the hardware that is still running. - -The system can work with fail domains of any size. However, if there are few domains and a different number of disks in different domains, the amount of storage groups that you can create will be limited. In this case, some hardware in fail domains that are too large may be underutilized. If the hardware is used in full, significant distortions in domain sizes may make reconfiguration impossible. - -> For example, there are 15 racks in a cluster with `block-4-2` fault tolerance mode. The first of the 15 racks hosts 20 servers and the other 14 racks host 10 servers each. To fully utilize all the 20 servers from the first rack, {{ ydb-short-name }} will create groups so that 1 disk from this largest fail domain is used in each group. As a result, if any other fail domain's hardware is down, the load can't be distributed to the hardware in the first rack. - -{{ ydb-short-name }} can group disks of different vendors, capacity, and speed. The resulting characteristics of a group depend on a set of the worst characteristics of the hardware that is serving the group. Usually the best results can be achieved if you use same-type hardware. When creating large clusters, keep in mind that hardware from the same batch is more likely to have the same defect and fail simultaneously. - -Therefore, we recommend the following optimal hardware configurations for production installations: - -* **A cluster hosted in 1 availability zone**: It uses `block4-2` fault tolerance mode and consists of 9 or more racks with the same amount of identical servers in each rack. -* **A cluster hosted in 3 availability zones**: It uses `mirror3-dc` fault tolerance mode and is distributed across 3 data centers with 4 or more racks in each of them, the racks being equipped with the same amount of identical servers. - -See also [{#T}](#reduced). - -## Redundancy recovery {#rebuild} - -Auto reconfiguration of storage groups reduces the risk of data loss in the event of multiple failures that occur within intervals sufficient to recover the redundancy. By default, reconfiguration is done one hour after {{ ydb-short-name }} detects a failure. - -Once a group is reconfigured, a new VDisk is automatically populated with data to restore the required storage redundancy in the group. This increases the load on other VDisks in the group and the network. To reduce the impact of redundancy recovery on the system performance, the total data replication speed is limited both on the source and target VDisks. - -The time it takes to restore the redundancy depends on the amount of data and hardware performance. For example, replication on fast NVMe SSDs may take an hour, while on large HDDs more than 24 hours. To make reconfiguration possible in general, a cluster should have free slots for creating VDisks in different fail domains. When determining the number of slots to be kept free, factor in the risk of hardware failure, the time it takes to replicate data and replace the failed hardware. - -## Simplified hardware configurations {#reduced} - -If it's not possible to use the [recommended amount](#requirements) of hardware, you can divide servers within a single rack into two dummy fail domains. In this configuration, a failure of 1 rack means a failure of 2 domains and not a single one. In [both fault tolerance modes](#fault-tolerance), {{ ydb-short-name }} will keep running if 2 domains fail. If you use the configuration with dummy fail domains, the minimum number of racks in a cluster is 5 for `block-4-2` mode and 2 in each data center for `mirror-3-dc` mode. - -## Fault tolerance level {#reliability} - -The table below describes fault tolerance levels for different fault tolerance modes and hardware configurations of a {{ ydb-short-name }} cluster: - -Fault tolerance
mode | Fail
domain | Fail
realm | Number of
data centers | Number of
server racks | Fault tolerance
level -:--- | :---: | :---: | :---: | :---: | :--- -`block-4-2` | Rack | Data center | 1 | 9 or more | Can stand a failure of 2 racks -`block-4-2` | ½ a rack | Data center | 1 | 5 or more | Can stand a failure of 1 rack -`block-4-2` | Server | Data center | 1 | Doesn't matter | Can stand a failure of 2 servers -`mirror-3-dc` | Rack | Data center | 3 | 4 in each data center | Can stand a failure of a data center and 1 rack in one of the two other data centers -`mirror-3-dc` | Server | Data center | 3 | Doesn't matter | Can stand a failure of a data center and 1 server in one of the two other data centers diff --git a/ydb/docs/en/core/changelog-server.md b/ydb/docs/en/core/changelog-server.md index 890cef609d1f..917abea2d8c8 100644 --- a/ydb/docs/en/core/changelog-server.md +++ b/ydb/docs/en/core/changelog-server.md @@ -23,7 +23,7 @@ Release date: October 12, 2023. * A new option `PostgreSQL` has been added to the query type selector settings, which is available when the `Enable additional query modes` parameter is enabled. Also, the query history now takes into account the syntax used when executing the query. * The YQL query template for creating a table has been updated. Added a description of the available parameters. * Now sorting and filtering for Storage and Nodes tables takes place on the server. To use this functionality, you need to enable the parameter `Offload tables filters and sorting to backend` in the experiments section. -* Buttons for creating, changing and deleting [topics](https://ydb.tech/ru/docs/concepts/topic) have been added to the context menu. +* Buttons for creating, changing and deleting [topics](concepts/topic.md) have been added to the context menu. * Added sorting by criticality for all issues in the tree in `Healthcheck`. **Performance:** @@ -155,7 +155,7 @@ Release date: May 5, 2023. To update to version 23.1, select the [Downloads](dow * Added [initial table scan](concepts/cdc.md#initial-scan) when creating a CDC changefeed. Now, you can export all the data existing at the time of changefeed creation. * Added [atomic index replacement](dba/secondary-indexes.md#atomic-index-replacement). Now, you can atomically replace one pre-defined index with another. This operation is absolutely transparent for your application. Indexes are replaced seamlessly, with no downtime. -* Added the [audit log](cluster/audit-log.md): Event stream including data about all the operations on {{ ydb-short-name }} objects. +* Added the [audit log](security/audit-log.md): Event stream including data about all the operations on {{ ydb-short-name }} objects. **Performance:** diff --git a/ydb/docs/en/core/cluster/index.md b/ydb/docs/en/core/cluster/index.md deleted file mode 100644 index 6ae4da5f7359..000000000000 --- a/ydb/docs/en/core/cluster/index.md +++ /dev/null @@ -1,10 +0,0 @@ -# {{ ydb-short-name }} cluster management overview - -This section provides information about deploying, configuring, maintaining, monitoring, and performing diagnostics of multi-node [{{ ydb-short-name }} clusters](../concepts/cluster/index.md). - -* [{#T}](../deploy/index.md). -* [{#T}](../maintenance/embedded_monitoring/index.md). -* [{#T}](../maintenance/manual/index.md). -* [{#T}](../devops/manual/system-views.md). -* [{#T}](../administration/monitoring.md). -* [{#T}](../administration/upgrade.md). diff --git a/ydb/docs/en/core/cluster/logs.md b/ydb/docs/en/core/cluster/logs.md deleted file mode 100644 index db3eb2468686..000000000000 --- a/ydb/docs/en/core/cluster/logs.md +++ /dev/null @@ -1,17 +0,0 @@ -# Logs -Each YDB component writes messages to logs at different levels. They can be used to detect severe issues or identify the root causes of issues. - -## Logging setup {#log_setup} -You can configure logging for the various components of the YDB [monitoring system](../maintenance/embedded_monitoring/logs.md#change_log_level). - -There are currently two options for running YDB logging. - -### Manually {#log_setup_manually} -YDB provides standard mechanisms for collecting logs and metrics. -Logging is done to standard `stdout` and `stderr` streams and can be redirected using popular solutions. We recommend using a combination of Fluentd and Elastic Stack. - -### Using systemd {#log_setup_systemd} -Default logs are written to `journald` and can be retrieved via -``` -journalctl -u ydbd -``` diff --git a/ydb/docs/en/core/cluster/system-requirements.md b/ydb/docs/en/core/cluster/system-requirements.md deleted file mode 100644 index 63f6fc4ba141..000000000000 --- a/ydb/docs/en/core/cluster/system-requirements.md +++ /dev/null @@ -1,37 +0,0 @@ -# System requirements and recommendations - -This section provides recommendations for deploying {{ ydb-short-name }}. - -## Hardware configuration {#hardware} - -The number of servers and disks is determined by the fault-tolerance requirements. For more information, see [{#T}](topology.md). - -* **Processor** - - A {{ ydb-short-name }} server can only run on x86-64 processors with AVX2 instruction support: Intel Haswell (4th generation) and later, AMD EPYC and later. - - The ARM architecture is currently not supported. - -* **RAM** - - We recommend using error-correcting code (ECC) memory to protect against hardware failures. - -* **Disk subsystem** - - A {{ ydb-short-name }} server can run on servers with any disk type (HDD/SSD/NVMe). However, we recommend using SSD/NVMe disks for better performance. - - {% include [_includes/storage-device-requirements.md](../_includes/storage-device-requirements.md) %} - - {{ ydb-short-name }} does not use a file system to store data and accesses disk volumes directly. Don't mount a file system or perform other operations with a partition that uses {{ ydb-short-name }}. We also do not recommend sharing the block device with other processes because this can lead to significant performance degradation. - - {{ ydb-short-name }} health and performance weren't tested on any types of virtual or network storage devices. - - When planning space, remember that {{ ydb-short-name }} uses some disk space for its own internal needs. For example, on a medium-sized cluster of 8 nodes, you can expect approximately 100 GB to be consumed for a static group on the whole cluster. On a large cluster with more than 1500 nodes, this will be about 200 GB. There are also logs of 25.6 GB on each Pdisk and a system area on each Pdisk. Its size depends on the size of the Pdisk, but is no less than 0.2 GB. - -## Software configuration {#software} - -A {{ ydb-short-name }} server can be run on servers running a Linux operating system with kernel 4.19 and higher and libc 2.30 (Ubuntu 20.04, Debian 11, Fedora34). YDB uses the [TCMalloc](https://google.github.io/tcmalloc) memory allocator. To make it effective, [enable](https://google.github.io/tcmalloc/tuning.html#system-level-optimizations) Transparent Huge Pages and Memory overcommitment. - -If the server has more than 32 CPU cores, to increase YDB performance, it makes sense to run each dynamic node in a separate taskset/cpuset of 10 to 32 cores. For example, in the case of 128 CPU cores, the best choice is to run four 32-CPU dynamic nodes, each in its taskset. - -MacOS and Windows operating systems are currently not supported for running {{ ydb-short-name }} servers. diff --git a/ydb/docs/en/core/cluster/toc_i.yaml b/ydb/docs/en/core/cluster/toc_i.yaml deleted file mode 100644 index 2b4297cc9ab7..000000000000 --- a/ydb/docs/en/core/cluster/toc_i.yaml +++ /dev/null @@ -1,27 +0,0 @@ -items: -- name: Deployment - include: { mode: link, path: ../deploy/toc_p.yaml } -- name: Access management - href: access.md -- name: Managing a cluster's disk subsystem - include: { mode: link, path: ../maintenance/manual/toc_p.yaml } -- name: Embedded UI - include: { mode: link, path: ../maintenance/embedded_monitoring/toc_p.yaml } -- name: Cluster system views - href: ../devops/manual/system-views.md -- name: Audit log - href: audit-log.md -- name: Short access control notation - href: short-access-control-notation.md -- name: Monitoring - items: - - name: Setting up monitoring for a local YDB cluster - href: ../administration/monitoring.md - - name: Grafana dashboards - href: ../administration/grafana-dashboards.md -- name: Updating YDB - href: ../administration/upgrade.md -- name: Changing an actor system's configuration - href: ../maintenance/manual/change_actorsystem_configs.md -- name: Updating configurations via CMS - href: ../maintenance/manual/cms.md diff --git a/ydb/docs/en/core/concepts/auth.md b/ydb/docs/en/core/concepts/auth.md index d90bd462f9d7..09a9be04a8f3 100644 --- a/ydb/docs/en/core/concepts/auth.md +++ b/ydb/docs/en/core/concepts/auth.md @@ -54,7 +54,7 @@ Authentication by username and password includes the following steps: To enable username/password authentication, use `true` in the `enforce_user_token_requirement` key of the cluster's [configuration file](../deploy/configuration/config.md#auth). -To learn how to manage roles and users, see [{#T}](../cluster/access.md). +To learn how to manage roles and users, see [{#T}](../security/access-management.md).