Skip to content

feat(query): support jsonb format #7522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 16, 2022
Merged

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Sep 8, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Support jsonb format.

Part of #6994

@vercel
Copy link

vercel bot commented Sep 8, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Sep 16, 2022 at 5:50AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Sep 8, 2022
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no serde-jsonb in rust community. How about publish this crate as a seperate crate so other projects can use this too?

@sundy-li
Copy link
Member

sundy-li commented Sep 8, 2022

There is no serde-jsonb in rust community. How about publish this crate as a seperate crate so other projects can use this too?

We can publish this crate if it's mature enough in databend.

@b41sh
Copy link
Member Author

b41sh commented Sep 8, 2022

There is no serde-jsonb in rust community. How about publish this crate as a seperate crate so other projects can use this too?

Good idea, not finished yet, I'll do it soon.

@b41sh
Copy link
Member Author

b41sh commented Sep 16, 2022

Do a simple bench test on my computer using this dataset.

It can be seen that in most cases, the time-consuming of JSONB parsing binary format is more than 10% less than that of serde_json. But parsing text format JSON to JSONB is more time-consuming, and further optimization is needed.

australia-abc.json

serde_json        [314.91 µs 343.93 µs 379.69 µs]
jsonb binary      [256.66 µs 258.74 µs 261.38 µs]
jsonb text        [249.57 µs 251.44 µs 253.65 µs]

doj-blog.json

serde_json        [402.08 µs 412.45 µs 423.58 µs]
jsonb binary      [167.58 µs 173.28 µs 179.91 µs]
jsonb text        [634.05 µs 667.76 µs 709.04 µs]

movies.json

serde_json        [86.233 ms 89.062 ms 92.274 ms]
jsonb binary      [78.585 ms 80.190 ms 82.067 ms]
jsonb text        [82.802 ms 83.659 ms 84.709 ms]

reddit-scala.json

serde_json         [1.1921 ms 1.2052 ms 1.2215 ms]
jsonb binary       [1.0642 ms 1.0787 ms 1.0972 ms]
jsonb text         [1.2374 ms 1.2535 ms 1.2738 ms]

twitter_api_response.json

serde_json         [150.16 µs 151.75 µs 153.85 µs]
jsonb binary       [130.99 µs 132.34 µs 134.14 µs]
jsonb text         [216.31 µs 222.22 µs 229.92 µs]

turkish.json

serde_json         [10.607 ms 11.274 ms 12.041 ms]
jsonb binary       [10.539 ms 11.232 ms 12.118 ms]
jsonb text         [11.020 ms 12.565 ms 15.161 ms]

eu-lobby-financial.json

serde_json         [738.58 µs 781.96 µs 833.95 µs]
jsonb binary       [1.0208 ms 1.1354 ms 1.2514 ms]
jsonb text         [1.0058 ms 1.0443 ms 1.0903 ms]

eu-lobby-country.json

serde_json         [170.53 µs 177.79 µs 186.52 µs]
jsonb binary       [114.08 µs 123.13 µs 132.79 µs]
jsonb text         [106.03 µs 113.94 µs 122.68 µs]

github-gists.json

serde_json         [647.23 µs 709.34 µs 776.59 µs]
jsonb binary       [563.26 µs 598.57 µs 631.81 µs]
jsonb text         [900.75 µs 954.88 µs 1.0183 ms]

temp-anomaly.json

serde_json         [64.536 µs 66.936 µs 69.440 µs]
jsonb binary       [56.925 µs 59.113 µs 61.409 µs]
jsonb text         [66.104 µs 69.220 µs 73.045 µs]

thai-cinemas.json

serde_json         [211.57 µs 229.67 µs 249.06 µs]
jsonb binary       [191.76 µs 209.23 µs 226.07 µs]
jsonb text         [183.60 µs 197.95 µs 214.07 µs]

twitter_api_compact_response.json

serde_json         [162.75 µs 174.27 µs 186.24 µs]
jsonb binary       [139.11 µs 153.01 µs 167.85 µs]
jsonb text         [268.93 µs 294.29 µs 322.15 µs]

rick-morty.json

serde_json         [316.81 µs 336.57 µs 358.77 µs]
jsonb binary       [253.51 µs 273.20 µs 293.58 µs]
jsonb text         [228.52 µs 249.94 µs 272.85 µs]

australia-abc.json

serde_json         [6.6678 ms 7.1825 ms 7.8494 ms]
jsonb binary       [5.4067 ms 5.6893 ms 6.0002 ms]
jsonb text         [6.1982 ms 6.3259 ms 6.4523 ms]

bitcoin.json

serde_json         [204.13 µs 217.58 µs 234.28 µs]
jsonb binary       [193.53 µs 202.46 µs 213.45 µs]
jsonb text         [236.83 µs 281.24 µs 349.36 µs]

github-events.json

serde_json          [1.6667 ms 1.8837 ms 2.1703 ms]
jsonb binary        [1.0699 ms 1.1535 ms 1.2524 ms]
jsonb text          [1.5197 ms 1.6192 ms 1.7264 ms]

json-generator.json

serde_json          [172.71 µs 181.89 µs 191.81 µs]
jsonb binary        [122.70 µs 129.20 µs 136.55 µs]
jsonb text          [224.50 µs 242.63 µs 267.22 µs]

eu-lobby-repr.json

serde_json          [2.1576 ms 2.3330 ms 2.5405 ms]
jsonb binary        [1.1517 ms 1.2476 ms 1.3603 ms]
jsonb text          [1.5902 ms 1.6757 ms 1.7667 ms]

@b41sh b41sh marked this pull request as ready for review September 16, 2022 00:28
@mergify mergify bot merged commit 990433f into databendlabs:main Sep 16, 2022
@b41sh b41sh mentioned this pull request Sep 16, 2022
4 tasks
@BohuTANG
Copy link
Member

Do a simple bench test on my computer using this dataset.

It can be seen that in most cases, the time-consuming of JSONB parsing binary format is more than 10% less than that of serde_json. But parsing text format JSON to JSONB is more time-consuming, and further optimization is needed.

australia-abc.json

serde_json        [314.91 µs 343.93 µs 379.69 µs]
jsonb binary      [256.66 µs 258.74 µs 261.38 µs]
jsonb text        [249.57 µs 251.44 µs 253.65 µs]

doj-blog.json

serde_json        [402.08 µs 412.45 µs 423.58 µs]
jsonb binary      [167.58 µs 173.28 µs 179.91 µs]
jsonb text        [634.05 µs 667.76 µs 709.04 µs]

movies.json

serde_json        [86.233 ms 89.062 ms 92.274 ms]
jsonb binary      [78.585 ms 80.190 ms 82.067 ms]
jsonb text        [82.802 ms 83.659 ms 84.709 ms]

reddit-scala.json

serde_json         [1.1921 ms 1.2052 ms 1.2215 ms]
jsonb binary       [1.0642 ms 1.0787 ms 1.0972 ms]
jsonb text         [1.2374 ms 1.2535 ms 1.2738 ms]

twitter_api_response.json

serde_json         [150.16 µs 151.75 µs 153.85 µs]
jsonb binary       [130.99 µs 132.34 µs 134.14 µs]
jsonb text         [216.31 µs 222.22 µs 229.92 µs]

turkish.json

serde_json         [10.607 ms 11.274 ms 12.041 ms]
jsonb binary       [10.539 ms 11.232 ms 12.118 ms]
jsonb text         [11.020 ms 12.565 ms 15.161 ms]

eu-lobby-financial.json

serde_json         [738.58 µs 781.96 µs 833.95 µs]
jsonb binary       [1.0208 ms 1.1354 ms 1.2514 ms]
jsonb text         [1.0058 ms 1.0443 ms 1.0903 ms]

eu-lobby-country.json

serde_json         [170.53 µs 177.79 µs 186.52 µs]
jsonb binary       [114.08 µs 123.13 µs 132.79 µs]
jsonb text         [106.03 µs 113.94 µs 122.68 µs]

github-gists.json

serde_json         [647.23 µs 709.34 µs 776.59 µs]
jsonb binary       [563.26 µs 598.57 µs 631.81 µs]
jsonb text         [900.75 µs 954.88 µs 1.0183 ms]

temp-anomaly.json

serde_json         [64.536 µs 66.936 µs 69.440 µs]
jsonb binary       [56.925 µs 59.113 µs 61.409 µs]
jsonb text         [66.104 µs 69.220 µs 73.045 µs]

thai-cinemas.json

serde_json         [211.57 µs 229.67 µs 249.06 µs]
jsonb binary       [191.76 µs 209.23 µs 226.07 µs]
jsonb text         [183.60 µs 197.95 µs 214.07 µs]

twitter_api_compact_response.json

serde_json         [162.75 µs 174.27 µs 186.24 µs]
jsonb binary       [139.11 µs 153.01 µs 167.85 µs]
jsonb text         [268.93 µs 294.29 µs 322.15 µs]

rick-morty.json

serde_json         [316.81 µs 336.57 µs 358.77 µs]
jsonb binary       [253.51 µs 273.20 µs 293.58 µs]
jsonb text         [228.52 µs 249.94 µs 272.85 µs]

australia-abc.json

serde_json         [6.6678 ms 7.1825 ms 7.8494 ms]
jsonb binary       [5.4067 ms 5.6893 ms 6.0002 ms]
jsonb text         [6.1982 ms 6.3259 ms 6.4523 ms]

bitcoin.json

serde_json         [204.13 µs 217.58 µs 234.28 µs]
jsonb binary       [193.53 µs 202.46 µs 213.45 µs]
jsonb text         [236.83 µs 281.24 µs 349.36 µs]

github-events.json

serde_json          [1.6667 ms 1.8837 ms 2.1703 ms]
jsonb binary        [1.0699 ms 1.1535 ms 1.2524 ms]
jsonb text          [1.5197 ms 1.6192 ms 1.7264 ms]

json-generator.json

serde_json          [172.71 µs 181.89 µs 191.81 µs]
jsonb binary        [122.70 µs 129.20 µs 136.55 µs]
jsonb text          [224.50 µs 242.63 µs 267.22 µs]

eu-lobby-repr.json

serde_json          [2.1576 ms 2.3330 ms 2.5405 ms]
jsonb binary        [1.1517 ms 1.2476 ms 1.3603 ms]
jsonb text          [1.5902 ms 1.6757 ms 1.7667 ms]

https://github.com/PSeitz/serde_json_borrow is 2X faster than the origin serde_json, cc @b41sh @sundy-li

@b41sh
Copy link
Member Author

b41sh commented Jan 16, 2023

https://github.com/PSeitz/serde_json_borrow is 2X faster than the origin serde_json, cc @b41sh @sundy-li

The parser of JSONB can be further optimized, we can learn something from this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants