Skip to content

Aggregate improvements and SQL compatibility #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 7, 2021

Conversation

nils-braun
Copy link
Collaborator

Triggered by the most recent integration of dask-sql into fugue (or technically the other way round), I am adding a large fraction of the SQL-compatibility tests of the fugue project in here.
To fulfill them, I had to implement some additional functions (e.g. IS DISTINCT) and a lot of special handling for NULL-cases. This means, starting from this PR dask-sql now turns away from treating NULLs in joins, groupbys and sorting as dask or pandas will do, but in turn get the same result as a "normal" SQL engine would give.

Also, I am fixing a bug in the sorting with multiple partitions.

@codecov-io
Copy link

codecov-io commented Feb 7, 2021

Codecov Report

Merging #134 (13b4318) into main (bdc518e) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #134   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           49        49           
  Lines         1884      1916   +32     
  Branches       250       258    +8     
=========================================
+ Hits          1884      1916   +32     
Impacted Files Coverage Δ
dask_sql/physical/rel/logical/aggregate.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/logical/join.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/logical/sort.py 100.00% <100.00%> (ø)
dask_sql/physical/rex/core/call.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdc518e...13b4318. Read the comment docs.

@nils-braun nils-braun merged commit e5fac1a into main Feb 7, 2021
@nils-braun nils-braun deleted the feature/aggregate-improvements branch February 7, 2021 15:20
demianw added a commit to NeuroLang/dask-sql that referenced this pull request Feb 8, 2021
commit e5fac1a
Author: Nils Braun <[email protected]>
Date:   Sun Feb 7 16:20:55 2021 +0100

    Aggregate improvements and SQL compatibility (dask-contrib#134)

    * A lot of refactoring the the groupby. Mainly to include both distinct and null-grouping

    * Test for non-dask aggregations

    * All NaN data needs to go into the same partition (otherwise we can not sort)

    * Fix compatibility with SQL on null-joins

    * Distinct is not needed, as it is optimized away from Calcite

    * Implement is not distinct

    * Describe new limitations and remove old ones

    * Added compatibility test from fugue

    * Added a test for sorting with multiple partitions and NaNs

    * Stylefix

commit 7273c2d
Author: Nils Braun <[email protected]>
Date:   Sun Feb 7 15:34:55 2021 +0100

    Docs improvements (dask-contrib#132)

    * Fixed a bug in function references in docs

    * More details on the dask-sql internals

commit bdc518e
Author: Nils Braun <[email protected]>
Date:   Sun Feb 7 14:19:50 2021 +0100

    Fix the fugue dependency (dask-contrib#133)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants