Skip to content

Commit bcc2d3e

Browse files
committed
Pre-merge changes based on comments
1 parent bd57b04 commit bcc2d3e

File tree

12 files changed

+154
-47
lines changed

12 files changed

+154
-47
lines changed

ydb/docs/en/core/concepts/glossary.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,16 @@ Together, these mechanisms allow {{ ydb-short-name }} to provide [strict consist
101101

102102
The implementation of distributed transactions is covered in a separate article [{#T}](../contributor/datashard-distributed-txs.md), while below there's a list of several [related terms](#distributed-transaction-implementation).
103103

104+
### Interactive transactions {#interactive-transaction}
105+
106+
The term **interactive transactions** refers to transactions that are split into multiple queries and involve data processing by an application between these queries. For example:
107+
108+
1. Select some data.
109+
1. Process the selected data in the application.
110+
1. Update some data in the database.
111+
1. Commit the transaction in a separate query.
112+
113+
104114
### Multi-version concurrency control {#mvcc}
105115

106116
[**Multi-version concurrency control**](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) or **MVCC** is a method {{ ydb-short-name }} used to allow multiple concurrent transactions to access the database simultaneously without interfering with each other. It is described in more detail in a separate article [{#T}](mvcc.md).
Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,23 @@
1-
1. Open the **DB overview** Grafana dashboard.
1+
1. Open the **[DB overview](../../../../../reference/observability/metrics/grafana-dashboards.md#dboverview)** Grafana dashboard.
22

33
1. In the **API details** section, see if the **Soft errors (retriable)** chart shows any spikes in the rate of queries with the `OVERLOADED` status.
44

55
![](../_assets/soft-errors.png)
66

7-
1. In the Grafana **DB status** dashboard, see if the number of sessions in the **Session count by host** chart exceeded the 1000 limit.
7+
1. To check if the spikes in overloaded errors were caused by exceeding the limit of 15000 queries in table partition queues:
8+
9+
1. In the [Embedded UI](../../../../../reference/embedded-ui/index.md), go to the **Databases** tab and click on the database.
10+
11+
1. On the **Navigation** tab, ensure the required database is selected.
12+
13+
1. Open the **Diagnostics** tab.
14+
15+
1. Open the **Top shards** tab.
16+
17+
1. In the **Immediate** and **Historical** tabs, sort the shards by the **InFlightTxCount** column and see if the top values reach the 15000 limit.
18+
19+
1. To check if the spikes in overloaded errors were caused by tablet splits and merges, see [{#T}](../../schemas/splits-merges.md).
20+
21+
1. To check if the spikes in overloaded errors were caused by exceeding the 1000 limit of open sessions, in the Grafana **DB status** dashboard, see the **Session count by host** chart.
822

923
1. See the [overloaded shards](../../schemas/overloaded-shards.md) issue.

ydb/docs/en/core/dev/troubleshooting/performance/queries/_includes/transaction-lock-invalidation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
1. Open the **DB overview** Grafana dashboard.
1+
1. Open the **[DB overview](../../../../../reference/observability/metrics/grafana-dashboards.md#dboverview)** Grafana dashboard.
22

33
1. See if the **Transaction Locks Invalidation** chart shows any spikes.
44

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Transaction lock invalidation
22

3-
Each transaction in {{ ydb-short-name }} uses [optimistic locking](https://en.wikipedia.org/wiki/Optimistic_concurrency_control) to ensure that no other transaction has modified the data it has read or changed. If the locks check reveals conflicting modifications, the committing transaction rolls back and must be restarted. In this case, {{ ydb-short-name }} returns a **transaction locks invalidated** error. Restarting a significant share of transactions can degrade your application's performance.
3+
{{ ydb-short-name }} uses [optimistic locking](https://en.wikipedia.org/wiki/Optimistic_concurrency_control) to find conflicts with other transactions being executed. If the locks check during the commit phase reveals conflicting modifications, the committing transaction rolls back and must be restarted. In this case, {{ ydb-short-name }} returns a **transaction locks invalidated** error. Restarting a significant share of transactions can degrade your application's performance.
44

55
{% note info %}
66

@@ -16,13 +16,14 @@ The YDB SDK provides a built-in mechanism for handling temporary failures. For m
1616

1717
## Recommendations
1818

19-
The longer a transaction lasts, the higher the likelihood of encountering a **transaction locks invalidated** error.
19+
Consider the following recommendations:
2020

21-
If possible, avoid interactive transactions. For example, try to avoid the following pattern:
21+
- The longer a transaction lasts, the higher the likelihood of encountering a **transaction locks invalidated** error.
2222

23-
1. Select some data.
24-
1. Process the selected data in the application.
25-
1. Update some data in the database.
26-
1. Commit the transaction in a separate query.
23+
If possible, avoid [interactive transactions](../../../../concepts/glossary.md#interactive-transaction). A better approach is to use a single YQL query with `begin;` and `commit;` to select data, update data, and commit the transaction.
2724

28-
A better approach is to use a single YQL query to select data, update data, and commit the transaction.
25+
If you do need interactive transactions, append `commit;` to the last query in the transaction.
26+
27+
- Analyze the range of primary keys where conflicting modifications occur, and try to change the application logic to reduce the number of conflicts.
28+
29+
For example, if a single row with a total balance value is frequently updated, split this row into a hundred rows and calculate the total balance as a sum of these rows. This will drastically reduce the number of **transaction locks invalidated** errors.
Loading

ydb/docs/en/core/dev/troubleshooting/performance/schemas/_includes/overloaded-shards-diagnostics.md

Lines changed: 59 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,80 @@
1-
1. Analyze the **Overloaded shard count** chart in the **DB overview** Grafana dashboard.
1+
1. Use the Embedded UI or Grafana to see if the {{ ydb-short-name }} nodes are overloaded:
22

3-
![](../_assets/overloaded-shards-dashboard.png)
3+
- In the **[DB overview](../../../../../reference/observability/metrics/grafana-dashboards.md#dboverview)** Grafana dashboard, analyze the **Overloaded shard count** chart.
44

5-
The chart indicates whether the {{ ydb-short-name }} cluster has overloaded shards, but it does not specify which table's shards are overloaded.
5+
![](../_assets/overloaded-shards-dashboard.png)
66

7-
2. To identify the table with overloaded shards, follow these steps:
7+
The chart indicates whether the {{ ydb-short-name }} cluster has overloaded shards, but it does not specify which table's shards are overloaded.
88

9-
1. In the [Embedded UI](../../../../../reference/embedded-ui/index.md), go to the **Databases** tab and click on the database.
9+
{% note tip %}
1010

11-
2. On the **Navigation** tab, ensure the required database is selected.
11+
Use Grafana to set up alert notifications when {{ ydb-short-name }} data shards get overloaded.
1212

13-
3. Open the **Diagnostics** tab.
13+
{% endnote %}
1414

15-
4. Open the **Top shards** tab.
1615

17-
5. In the **Immediate** and **Historical** tabs, sort the shards by the **CPUCores** column and analyze the information.
16+
- In the [Embedded UI](../../../../../reference/embedded-ui/index.md):
1817

19-
![](../_assets/partitions-by-cpu.png)
18+
1. Go to the **Databases** tab and click on the database.
2019

21-
Additionally, the information about overloaded shards is provided as a system table. For more information, see [{#T}](../../../../system-views.md#top-overload-partitions).
20+
1. On the **Navigation** tab, ensure the required database is selected.
2221

23-
{% endnote %}
22+
1. Open the **Diagnostics** tab.
2423

25-
3. To pinpoint the schema issue, follow these steps:
24+
1. Open the **Top shards** tab.
2625

27-
1. Retrieve information about the problematic table using the [{{ ydb-short-name }} CLI](../../../../../reference/ydb-cli/index.md). Run the following command:
26+
1. In the **Immediate** and **Historical** tabs, sort the shards by the **CPUCores** column and analyze the information.
2827

29-
```bash
30-
ydb scheme describe <table_name>
31-
```
28+
![](../_assets/partitions-by-cpu.png)
3229

33-
2. In the command output, analyze the **Auto partitioning settings**:
30+
Additionally, the information about overloaded shards is provided as a system table. For more information, see [{#T}](../../../../system-views.md#top-overload-partitions).
3431

35-
* `Partitioning by size`
36-
* `Partitioning by load`
37-
* `Max partitions count`
32+
1. To pinpoint the schema issue, use the [Embedded UI](../../../../../reference/embedded-ui/index.md) or [{{ ydb-short-name }} CLI](../../../../../reference/ydb-cli/index.md):
3833

39-
If the table does not have these options, see [Recommendations for table configuration](../overloaded-shards.md#table-config).
34+
- In the [Embedded UI](../../../../../reference/embedded-ui/index.md):
4035

41-
4. Analyze whether primary key values increment monotonically:
36+
1. On the **Databases** tab, click on the database.
37+
38+
1. On the **Navigation** tab, select the required table.
39+
40+
1. Open the **Diagnostics** tab.
41+
42+
1. On the **Describe** tab, navigate to `root > PathDescription > Table > PartitionConfig > PartitioningPolicy`.
43+
44+
![Describe](../_assets/describe.png)
45+
46+
1. Analyze the **PartitioningPolicy** values:
47+
48+
- `SizeToSplit`
49+
- `SplitByLoadSettings`
50+
- `MaxPartitionsCount`
51+
52+
If the table does not have these options, see [Recommendations for table configuration](../overloaded-shards.md#table-config).
53+
54+
{% note info %}
55+
56+
You can also find this information on the **Diagnostics > Info** tab.
57+
58+
{% endnote %}
59+
60+
61+
- In the [{{ ydb-short-name }} CLI](../../../../../reference/ydb-cli/index.md):
62+
63+
1. To retrieve information about the problematic table, run the following command:
64+
65+
```bash
66+
ydb scheme describe <table_name>
67+
```
68+
69+
2. In the command output, analyze the **Auto partitioning settings**:
70+
71+
- `Partitioning by size`
72+
- `Partitioning by load`
73+
- `Max partitions count`
74+
75+
If the table does not have these options, see [Recommendations for table configuration](../overloaded-shards.md#table-config).
76+
77+
1. Analyze whether primary key values increment monotonically:
4278

4379
- Check the data type of the primary key column. `Serial` data types are used for autoincrementing values.
4480

ydb/docs/en/core/dev/troubleshooting/performance/schemas/overloaded-shards.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,24 @@ Consider the following solutions to address shard overload:
2727

2828
* If the problematic table is not partitioned by load, enable partitioning by load.
2929

30+
{% note tip %}
31+
32+
A table is not partitioned by load, if you see the `Partitioning by load: false` line on the **Diagnostics > Info** tab in the **Embedded UI** or the `ydb scheme describe` command output.
33+
34+
{% endnote %}
35+
3036
* If the table has reached the maximum number of partitions, increase the partition limit.
3137

38+
{% note tip %}
39+
40+
To determine the number of partitions in the table, see the `PartCount` value on the **Diagnostics > Info** tab in the **Embedded UI**.
41+
42+
{% endnote %}
43+
44+
3245
Both operations can be performed by executing an [`ALTER TABLE ... SET`](../../../../yql/reference/syntax/alter_table/set.md) query.
3346

47+
3448
### For the imbalanced primary key {#pk-recommendations}
3549

3650
Consider modifying the primary key to distribute the load evenly across table partitions. You cannot change the primary key of an existing table. To do that, you will have to create a new table with the modified primary key and then migrate the data to the new table.

ydb/docs/en/core/dev/troubleshooting/performance/schemas/splits-merges.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
{% endif %}
88

9-
Each [row-oriented table](../../../../concepts/datamodel/table.md#row-oriented-tables) partition in {{ ydb-short-name }} is processed by a [data shard](../../../../concepts/glossary.md#data-shard) tablet. {{ ydb-short-name }} supports automatic splitting and merging of data shards which allows it to seamlessly adapt to changes in workloads. However, these operations are not free and might have a short-term negative impact on query latencies.
9+
Each [row-oriented table](../../../../concepts/datamodel/table.md#row-oriented-tables) partition in {{ ydb-short-name }} is processed by a [data shard](../../../../concepts/glossary.md#data-shard) tablet. {{ ydb-short-name }} supports automatic [splitting and merging](../../../../concepts/datamodel/table.md#partitioning) of data shards which allows it to seamlessly adapt to changes in workloads. However, these operations are not free and might have a short-term negative impact on query latencies.
1010

1111
When {{ ydb-short-name }} splits a partition, it replaces the original partition with two new partitions covering the same range of primary keys. Now, two data shards process the range of primary keys that was previously handled by a single data shard, thereby adding more computing resources for the table.
1212

@@ -27,6 +27,6 @@ When configuring [table partitioning](../../../../concepts/datamodel/table.md#pa
2727

2828
## Recommendations
2929

30-
If the user load on {{ ydb-short-name }} has not changed, consider adjusting the gap between the min and max limits for the number of table partitions to the recommended 20% difference.
30+
If the user load on {{ ydb-short-name }} has not changed, consider adjusting the gap between the min and max limits for the number of table partitions to the recommended 20% difference. Use the [`ALTER TABLE table_name SET (key = value)`](../../../../yql/reference/syntax/alter_table/set.md) YQL statement to update the [`AUTO_PARTITIONING_MIN_PARTITIONS_COUNT`](../../../../concepts/datamodel/table.md#auto_partitioning_min_partitions_count) and [`AUTO_PARTITIONING_MAX_PARTITIONS_COUNT`](../../../../concepts/datamodel/table.md#auto_partitioning_max_partitions_count) parameters.
3131

3232
If you want to avoid splitting and merging data shards, you can set the min limit to the max limit value or disable partitioning by load.

ydb/docs/en/core/dev/troubleshooting/performance/system/system-clock-drift.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,14 @@
22

33
Synchronized clocks are critical for distributed databases. If system clocks on the {{ ydb-short-name }} servers drift excessively, distributed transactions will experience increased latencies.
44

5-
If a {{ ydb-short-name }} cluster with multiple [coordinators](../../../../concepts/glossary.md#coordinator), planned transactions are merged by [mediators](../../../../concepts/glossary.md#mediator) before being sent off for execution.
5+
{% note alert %}
66

7-
If the system clocks of the nodes running the coordinator tablets differ, transaction latencies increase by the time difference between the fastest and slowest system clocks. This occurs because a transaction planned on a node with a faster system clock can only be executed once the coordinator with the slowest clock reaches the same time.
7+
It is important to keep system clocks on the {{ ydb-short-name }} servers in sync, to avoid high latencies.
8+
9+
{% endnote %}
10+
11+
12+
If the system clocks of the nodes running the [coordinator](../../../../concepts/glossary.md#coordinator) tablets differ, transaction latencies increase by the time difference between the fastest and slowest system clocks. This occurs because a transaction planned on a node with a faster system clock can only be executed once the coordinator with the slowest clock reaches the same time.
813

914
Furthermore, if the system clock drift exceeds 30 seconds, {{ ydb-short-name }} will refuse to process distributed transactions. Before coordinators start planning a transaction, affected [Data shards](../../../../concepts/glossary.md#data-shard) determine an acceptable range of timestamps for the transaction. The start of this range is the current time of the mediator tablet's clock, while the 30-second planning timeout determines the end. If the coordinator's system clock exceeds this time range, it cannot plan a distributed transaction, resulting in errors for such queries.
1015

ydb/docs/en/core/dev/troubleshooting/performance/ydb/tablets-moved.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,12 @@
22

33
{{ ydb-short-name }} automatically balances the load by moving tablets from overloaded nodes to other nodes. This process is managed by [Hive](../../../../concepts/glossary.md#hive). When Hive moves tablets, queries affecting those tablets might experience increased latencies while they wait for the tablet to get initialized on the new node.
44

5-
[//]: # (This information is taken from a draft topic Concepts > Hive.)
6-
[//]: # (TODO: When the above-mentioned topic is merged, remove the info from here and add a link.)
7-
85
{{ ydb-short-name }} considers usage of the following hardware resources for balancing nodes:
96

107
- CPU
118
- Memory
129
- Network
13-
- [Counter](*counter)
10+
- [Count](*count)
1411

1512
Autobalancing occurs in the following cases:
1613

@@ -86,6 +83,5 @@ Adjust Hive balancer settings:
8683
{% endnote %}
8784

8885

89-
90-
[*counter]: A virtual resource is used for balancing tablets that lack other hardware resource metrics (such as CPU, memory, or network) and for column shards. If a tablet uses this resource, its value is always set to 1.
86+
[*count]: Count is a virtual resource for distributing tablets of the same type evenly between nodes.
9187

0 commit comments

Comments
 (0)