You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
40
40
-[#2482](https://github.com/meltano/sdk/issues/2482) Allow SQL tap developers to auto-skip certain schemas from discovery
41
41
-[#2784](https://github.com/meltano/sdk/issues/2784) Added a new built-in setting `activate_version` for targets to optionally disable processing of `ACTIVATE_VERSION` messages
42
42
-[#2780](https://github.com/meltano/sdk/issues/2780) Numeric values are now parsed as `decimal.Decimal` in REST and GraphQL stream responses
43
-
-[#2775](https://github.com/meltano/sdk/issues/2775) Log a stream's bookmark (if it's avaiable) when its sync starts
43
+
-[#2775](https://github.com/meltano/sdk/issues/2775) Log a stream's bookmark (if it's available) when its sync starts
44
44
-[#2703](https://github.com/meltano/sdk/issues/2703) Targets now emit record count from the built-in batch file processor
45
45
-[#2774](https://github.com/meltano/sdk/issues/2774) Accept a `maxLength` limit for VARCHARs
46
46
-[#2769](https://github.com/meltano/sdk/issues/2769) Add `versioning-strategy` to dependabot config of Cookiecutter templates
@@ -210,7 +210,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
210
210
### ✨ New
211
211
212
212
-[#2432](https://github.com/meltano/sdk/issues/2432) Developers can now customize the default logging configuration for their taps/targets by adding `default_logging.yml` to their package
213
-
-[#2531](https://github.com/meltano/sdk/issues/2531) The `json` module is now avaiable to stream maps -- _**Thanks @grigi!**_
213
+
-[#2531](https://github.com/meltano/sdk/issues/2531) The `json` module is now available to stream maps -- _**Thanks @grigi!**_
214
214
-[#2529](https://github.com/meltano/sdk/issues/2529) Stream sync context is now available to all instances methods as a `Stream.context` attribute
215
215
216
216
### 🐛 Fixes
@@ -330,7 +330,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
330
330
### 📚 Documentation Improvements
331
331
332
332
-[#2239](https://github.com/meltano/sdk/issues/2239) Linked reference docs to source code
333
-
-[#2231](https://github.com/meltano/sdk/issues/2231) Added an example implemetation of JSON schema validation that uses `fastjsonschema`
333
+
-[#2231](https://github.com/meltano/sdk/issues/2231) Added an example implementation of JSON schema validation that uses `fastjsonschema`
334
334
-[#2219](https://github.com/meltano/sdk/issues/2219) Added reference docs for tap & target testing helpers
335
335
336
336
## v0.35.0 (2024-02-02)
@@ -748,7 +748,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
748
748
749
749
### ✨ New
750
750
751
-
-[#1262](https://github.com/meltano/sdk/issues/1262) Support string `"__NULL__"`whereever null values are allowed in stream maps configuration
751
+
-[#1262](https://github.com/meltano/sdk/issues/1262) Support string `"__NULL__"`wherever null values are allowed in stream maps configuration
752
752
753
753
### 🐛 Fixes
754
754
@@ -1286,7 +1286,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1286
1286
1287
1287
### Changes
1288
1288
1289
-
- Target SDK: Improved performance for Batch Sinks by skipping extra drain operations when newly recieved STATE messages are unchanged from the prior received STATE (#172, !125) -- _Thanks, **[Pat Nadolny](https://gitlab.com/pnadolny13)**!_
1289
+
- Target SDK: Improved performance for Batch Sinks by skipping extra drain operations when newly received STATE messages are unchanged from the prior received STATE (#172, !125) -- _Thanks, **[Pat Nadolny](https://gitlab.com/pnadolny13)**!_
Copy file name to clipboardExpand all lines: docs/CONTRIBUTING.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -161,7 +161,7 @@ Sphinx will automatically generate class stubs, so be sure to `git add` them.
161
161
162
162
## Semantic Pull Requests
163
163
164
-
This repo uses the [semantic-prs](https://github.com/Ezard/semantic-prs) GitHub app to check all PRs againts the conventional commit syntax.
164
+
This repo uses the [semantic-prs](https://github.com/Ezard/semantic-prs) GitHub app to check all PRs against the conventional commit syntax.
165
165
166
166
Pull requests should be named according to the conventional commit syntax to streamline changelog and release notes management. We encourage (but do not require) the use of conventional commits in commit messages as well.
Copy file name to clipboardExpand all lines: docs/implementation/at_least_once.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ According to the Singer spec, bookmark comparisons are performed on the basis of
18
18
19
19
[Replication Key Signposts](./state.md#replication-key-signposts) are an internal and automatic feature of the SDK. Signposts are necessary in order to deliver the 'at least once' delivery promise for unsorted streams and parent-child streams. The function of a signpost is to ensure that bookmark keys do not advance past a point where we may have not synced all records, such as for unsorted or reverse-sorted streams. This feature also enables developers to override `state_partitioning_key`, which reduces the number of bookmarks needed to track state on parent-child streams with a large number of parent records.
20
20
21
-
In all applications, the signpost prevents the bookmark's value from advancing too far and prevents records from being skipped in future sync operations. We _intentionally_ do not advance the bookmark as far as the max replication key value from all records we've synced, with the knowlege that _some_ records with equal or lower replication key values may have not yet been synced. It follows then, that any records whose replication key is greater than the signpost value will necessarily be re-synced in the next execution, causing some amount of record duplication downstream.
21
+
In all applications, the signpost prevents the bookmark's value from advancing too far and prevents records from being skipped in future sync operations. We _intentionally_ do not advance the bookmark as far as the max replication key value from all records we've synced, with the knowledge that _some_ records with equal or lower replication key values may have not yet been synced. It follows then, that any records whose replication key is greater than the signpost value will necessarily be re-synced in the next execution, causing some amount of record duplication downstream.
22
22
23
23
### Cause #3: Stream interruption
24
24
@@ -32,11 +32,11 @@ There are two generally recommended approaches for dealing with record duplicati
32
32
33
33
Assuming that a primary key exists, most target implementation will simply use the primary key to merge newly received records with their prior versions, eliminating any risk of duplication in the destination dataset.
34
34
35
-
However, this approach will not work for streams that lack primary keys or in implentations running in pure 'append only' mode. For these cases, some amount of record duplication should be expected and planned for by the end user.
35
+
However, this approach will not work for streams that lack primary keys or in implementations running in pure 'append only' mode. For these cases, some amount of record duplication should be expected and planned for by the end user.
36
36
37
37
### Strategy #2: Removing duplicates using `dbt` transformations
38
38
39
-
For cases where the destination table _does not_ use primary keys, the most common way of resolving duplicates after they've landed in the downstream dataset is to apply a `ROW_NUMBER()` function in a tool like [dbt](https://www.getdbt.com). The `ROW_NUMBER()` function can caculate a `dedupe_rank` and/or a `recency_rank` in the transformation layer, and then downstream queries can easily filter out any duplicates using the calculated rank. Users can write these transformations by hand or leverage the [deduplicate-source](https://github.com/dbt-labs/dbt-utils#deduplicate-source) macro from the [dbt-utils](https://github.com/dbt-labs/dbt-utils) package.
39
+
For cases where the destination table _does not_ use primary keys, the most common way of resolving duplicates after they've landed in the downstream dataset is to apply a `ROW_NUMBER()` function in a tool like [dbt](https://www.getdbt.com). The `ROW_NUMBER()` function can calculate a `dedupe_rank` and/or a `recency_rank` in the transformation layer, and then downstream queries can easily filter out any duplicates using the calculated rank. Users can write these transformations by hand or leverage the [deduplicate-source](https://github.com/dbt-labs/dbt-utils#deduplicate-source) macro from the [dbt-utils](https://github.com/dbt-labs/dbt-utils) package.
Copy file name to clipboardExpand all lines: docs/stream_maps.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -175,7 +175,7 @@ to expressions using the `config` dictionary.
175
175
### Constructing Expressions
176
176
177
177
Expressions are defined and parsed using the
178
-
[`simpleval`](https://github.com/danthedeckie/simpleeval) expression library. This library
178
+
[`simpleeval`](https://github.com/danthedeckie/simpleeval) expression library. This library
179
179
accepts most native python expressions and is extended by custom functions which have been declared
180
180
within the SDK.
181
181
@@ -499,7 +499,7 @@ faker_config:
499
499
locale: en_US
500
500
```
501
501
502
-
Remember, these expressions are evaluated by the [`simpleval`](https://github.com/danthedeckie/simpleeval) expression library, which only allows a single python expression (which is the reason for the `or` syntax above).
502
+
Remember, these expressions are evaluated by the [`simpleeval`](https://github.com/danthedeckie/simpleeval) expression library, which only allows a single python expression (which is the reason for the `or` syntax above).
503
503
504
504
This means if you require more advanced masking logic, which cannot be defined in a single python expression, you may need to consider a custom stream mapper.
505
505
@@ -749,7 +749,7 @@ excluded at the tap level, then the stream will be skipped exactly as if it were
749
749
in the catalog metadata.
750
750
751
751
If a stream is specified to be excluded at the target level, or in a standalone mapper
752
-
between the tap and target, the filtering occurs downstream from the tap and therefor cannot
752
+
between the tap and target, the filtering occurs downstream from the tap and therefore cannot
753
753
affect the selection rules of the tap itself. Except in special test cases or in cases where
754
754
runtime is trivial, we highly recommend implementing stream-level exclusions at the tap
755
755
level rather than within the downstream target or mapper plugins.
0 commit comments