Stop `data-diff` when maximum time or # different records is exceeded

**Is your feature request related to a problem? Please describe.**

We run `data-diff` for many tables. Sometimes there are a lot of differences between the diffed tables. If so, the data diff for this tablepair might take a very long time (multiple hours). I prefer to skip this diff at a certain point, e.g., when a maximum diff time or # different records is exceeded. For such a diff, I do not care which records differ precisely, I am ok with knowing that this table is very off.

**Describe the solution you'd like**

Define a:
- maximum diff time
- OR, a maximum # different records
- OR, a maximum % different records

If this threshold is exceeded, the diff is aborted, with a WARNING or ERROR message, and maybe an Exception.

**Describe alternatives you've considered**

I run `data-diff` programmatically and built this feature myself in the Python script that calls `data-diff`. This did not work as I hoped because `data-diff` uses a `ThreadPool` that continued with the diff after I broke out of the [`diff_tables`](https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.diff_tables) iterable.

**Additional context**




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop `data-diff` when maximum time or # different records is exceeded #402

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stop data-diff when maximum time or # different records is exceeded #402

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stop `data-diff` when maximum time or # different records is exceeded #402