Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Detect duplicate rows on each side #850

Merged
merged 1 commit into from
Jan 11, 2024
Merged

Detect duplicate rows on each side #850

merged 1 commit into from
Jan 11, 2024

Conversation

nolar
Copy link
Contributor

@nolar nolar commented Jan 11, 2024

A known flaw: if there are equal duped rows, e.g.:

A: [pk=1000, val=hello], [pk=1000, val=hello]
B: [pk=1000, val=hello], [pk=1000, val=hello]

… then we might not notice them even on the level of checksum scanning of table segments. If the segments are fully equal, these dupes will never be yielded, neither with -/+, nor with a potentially different informational marker * introduced specially for dupes. It will only be noticed in segments that have some other (unrelated) differences. Which makes this dupe-detection not fully reliable.

@nolar nolar requested a review from dlawin January 11, 2024 11:21
@nolar nolar force-pushed the detect-duplicates branch from 41d71b0 to 8944e5f Compare January 11, 2024 16:45
@nolar nolar requested a review from vvkh January 11, 2024 16:45
@nolar nolar merged commit f8dd74c into master Jan 11, 2024
@nolar nolar deleted the detect-duplicates branch January 11, 2024 18:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants