You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ydb/docs/en/core/contributor/datashard-distributed-txs.md
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ The distributed transaction body needs to have enough information about the othe
64
64
*`SendingShards` are shards which send ReadSets to all shards in the `ReceivingShards` set
65
65
*`ReceivingShards` are shards which expect ReadSets from all shards in the `SendingShards` set
66
66
67
-
Volatile transactions expect all shards to be in the `SendingShards` set (any shard may abort the transaction and needs to send its commit decision to other shards), and all shards that apply changes to be in the `ReceivingShards` set (whether or not changes are committed depends on the decisions from other shards). Exchanged ReadSets data is the a serialized [TReadSetData](https://github.com/ydb-platform/ydb/blob/a833a4359ba77706f9b1fe4104741ef0acfbc83b/ydb/core/protos/tx.proto#L312) message with a single `Decision` field specified.
67
+
Volatile transactions expect all shards to be in the `SendingShards` set (any shard may abort the transaction and needs to send its commit decision to other shards), and all shards that apply changes to be in the `ReceivingShards` set (whether or not changes are committed depends on the decisions from other shards). Exchanged ReadSets data is a serialized [TReadSetData](https://github.com/ydb-platform/ydb/blob/a833a4359ba77706f9b1fe4104741ef0acfbc83b/ydb/core/protos/tx.proto#L312) message with a single `Decision` field specified.
68
68
69
69
An example of a distributed transaction that doesn't use ReadSets is a persistent distributed transaction with blind writes. In this case, after successful planning, transaction cannot be aborted, and the shards must ensure the future success of the transaction during the prepare phase.
70
70
@@ -104,7 +104,7 @@ Volatile transactions are based on [persistent uncommitted changes](localdb-unco
104
104
105
105
Having uncommitted and not yet aborted changes limits the shard's ability to perform blind writes. Because uncommitted changes must be committed in the same order in which they were applied to any given key, and shards don't keep these keys in memory after transactions have been executed, DataShard needs to read each key and detect conflicts before it can perform any write. These conflicting changes may be uncommitted changes associated with an optimistic lock (and these locks must be broken along with rolling back the changes), or uncommitted changes from waiting volatile transactions. We don't want to unnecessarily block writes, so even non-volatile operations can switch to a volatile commit. Such operations allocate a `GlobalTxId` when necessary (this is a per-cluster unique `TxId` that is needed when the request doesn't provide one, such as in `BulkUpsert`) and write changes to conflicting keys as uncommitted. The transaction record is then added to the `VolatileTxManager` without specifying other participants and becomes initially committed. It also specifies a list of dependencies, which are transactions that must be completed before the transaction can be committed. These transactions don't block reads and are eventually committed in the local database after all of their dependencies have also been completed or aborted.
106
106
107
-
To reduce stalls in the pipeline, transactions use the conflict cache for keys that are declared for writes in distributed transactions. These keys are read while transactions are waiting in the queue, and conflicting transaction sets are cached by the [RegisterDistributedWrites](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/build_and_wait_dependencies_unit.cpp#L95) function call. All writes to cached keys update these conflict sets and keep them up-to-date. This allows distributed transactions with writes to execute faster by processing lists of conflicting transactions using a hash table lookup even when the table data is evicted from the cache.
107
+
To reduce stalls in the pipeline, transactions use the conflict cache for keys that are declared for writes in distributed transactions. These keys are read while transactions are waiting in the queue, and conflicting transaction sets are cached by the [RegisterDistributedWrites](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/build_and_wait_dependencies_unit.cpp#L95) function call. All writes to cached keys update these conflict sets and keep them up-to-date. This allows distributed transactions with writes to execute faster by processing lists of conflicting transactions using a hash table lookup even when the table data is evicted from memory.
108
108
109
109
DataShard may have change collectors, such as async indexes and/or Change Data Capture (CDC). Collecting these changes for volatile transactions is similar to [uncommitted changes in transactions](datashard-locks-and-change-visibility.md), generating a stream of uncommitted change records using the `TxId` as its `LockId`. These records are then either added atomically to shard change records upon commit, or deleted upon abort. Depending on the settings of the change collector, it may also need to read the current row state. If this row state depends on other volatile transactions that are waiting, it is handled similarly to any other read, by adding a dependency and restarting when these dependencies are resolved. This can cause the transaction pipeline to stall, but it's conceptually similar to persistent distributed transactions with ReadSets that can also be stalled by read-write conflicts.
110
110
@@ -122,13 +122,13 @@ Some examples of potential sources of error for the transaction:
122
122
2. The coordinator may unexpectedly restart without transferring its in-memory state, and the transaction may have been sent to some participants but not all of them.
123
123
3. Any shard might have decided to abort the transaction due to an error.
124
124
125
-
An important guarantee is that, if any shard successfully persists uncommitted changes and a transaction record, and if it receives a `DECISION_COMMIT` from all other participants, the transaction will be eventually committed at all participants and can't be rolled back. It's also important to note that, if any shard aborts a transaction due to an error (such as forgetting about the transaction after restarting), the shard will never be able to complete the transaction and will eventually roll back.
125
+
An important guarantee is that, if any shard successfully persists uncommitted changes and a transaction record, and if it receives a `DECISION_COMMIT` from all other participants, the transaction will be eventually committed at all participants and can't be rolled back. It's also important to note that, if any shard aborts a transaction due to an error (such as forgetting about the transaction after restarting), the transaction will never be able to complete and will eventually roll back.
126
126
127
127
These guarantees are based on the following:
128
128
129
129
1. ReadSets with `DECISION_COMMIT` are sent to other participants only after all uncommitted changes, outgoing ReadSets and a transaction record have been persisted to disk. Therefore, `DECISION_COMMIT` messages are persistent and will be delivered to other participants until they are acknowledged. These messages will continue to be delivered even if the transaction is aborted by another shard.
130
130
2. ReadSets are only acknowledged either when a shard successfully writes to disk or when a read-only lease is active and an unexpected ReadSet is received. Specifically, an older tablet instance will not acknowledge a ReadSet for a transaction that has already been prepared and executed by a newer tablet. The local database ensures that a new tablet will not be activated until the read-only lease on older tablets has expired (provided that the monotonic clock frequency is not more than twice as fast). A successful write confirms that the tablet held the storage lock at the beginning of the commit and no newer tablets were running at the same time.
131
-
* Volatile transactions use an optimization where received ReadSets are not written to disk. Acknowledgement is sent only after a the commit or abort (removal) of the transaction record has been fully persisted on disk.
131
+
* Volatile transactions use an optimization where received ReadSets are not written to disk. Acknowledgement is sent only after a commit or an abort (removal) of the transaction record has been fully persisted on disk.
132
132
* When all participants have generated and persisted their outgoing `DECISION_COMMIT` ReadSets they will continue to receive the full set of commit decisions until they have fully committed the transaction effects and sent their acknowledgements.
133
133
3. Any message related to an aborted transaction is only sent after the abort has been written to disk. This is necessary to handle situations where multiple generations of a particular shard are running at the same time. After a newer generation has successfully committed a transaction and acknowledged ReadSets, an older generation (which would fail when trying to access the disk or validate a read-only lease) may receive a `NO_DATA` reply to its ReadSet Expectation (which was acknowledged by the newer generation). However, the older generation will not be able to commit the removal of the transaction record, and will not be able to incorrectly reply with an error. When the removal of the transaction record (and the transaction effects) have been fully committed, the current generation can be assured that newer generations won't start with the transaction again, and it won't commit.
134
134
4. A successful reply after collecting all `DECISION_COMMIT` ReadSets does not need to wait for the final commit in the local database. Instead, it has to wait until the uncommitted effects and the transaction record have been persisted. This allows the successful reply to have a 1RTT storage latency on the critical path, which is caused by writing uncommitted changes and the transaction record. This successful response is stable:
@@ -151,7 +151,7 @@ In other words, if a client observes the result of a volatile transaction at a c
151
151
152
152
### Indirect planning of volatile transactions
153
153
154
-
The coordinator restart may cause the plan steps to only reach a subset of the participants in a volatile distributed transaction. As a result, some shards may execute the transaction and begin waiting for ReadSets, while other shards continue to wait for the plan step to arrive. Due to the planning timeout of 30 seconds, this could cause some transactions to experience excessive delays before eventually aborting.
154
+
The coordinator restart may cause the plan step to only reach a subset of the participants in a volatile distributed transaction. As a result, some shards may execute the transaction and begin waiting for ReadSets, while other shards continue to wait for the plan step to arrive. Due to the planning timeout of 30 seconds, this could cause some transactions to experience excessive delays before eventually aborting.
155
155
156
156
When any participant receives the first plan step of a volatile transaction, they will include the `PlanStep` in ReadSets they send to other participants. The other participants then indirectly learn about the `PlanStep` that was assigned to the transaction and remember it as the `PredictedPlanStep`. Even if their own plan step is lost, DataShard will add the transaction to the `PlanQueue` when its mediator time (directly or indirectly) reaches the `PredictedPlanStep`, as if they had received a plan step with that predicted `PlanStep` and `TxId`. If the `PredictedPlanStep` is in the past, the transaction will be quickly aborted, as if it had reached the planning deadline.
0 commit comments