Skip to content

feat: try to improve object storage io read #9335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Dec 26, 2022

Conversation

BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Dec 22, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Try to merge neighboring io operations using RangeMerger

  • If the distance between two IO ranges to be read in one file is less than storage_io_min_bytes_for_seek, then Databend sequentially reads a range of files that contains both ranges, thus avoiding extra seek.
  • This only works for the remote read.

Introduce two remote read IO settings

  • storage_io_min_bytes_for_seek: If the distance between two IO requests to be read in one file is less than storage_io_min_bytes_for_seek, merge them into one, the default value is 512Bytes.
  • storage_io_max_page_bytes_for_read: The maximum bytes of one IO request to read. Default the value is 512KB

Introduce remote IO request perf counter

  • remote_io_seeks: the raw range numbers before IO merged
  • remote_io_seeks_after_merged: the ranges need to read after IO merged
  • remote_io_read_bytes: the bytes read before IO merged
  • remote_io_read_bytes_after_merged: the bytes need to read after IO merged
  • remote_io_read_milliseconds: read cost in milliseconds

Case1:

SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10 ignore_result;

-- No merged IO (set storage_io_min_bytes_for_seek=0;)
mysql> select metric, value  from system.metrics where metric like '%remote_io%' order by metric;
+----------------------------------------+---------------+
| metric                                 | value         |
+----------------------------------------+---------------+
| fuse_remote_io_read_bytes              | 15205646411.0 |
| fuse_remote_io_read_bytes_after_merged | 15205646411.0 |
| fuse_remote_io_read_milliseconds       | 546271.0      |
| fuse_remote_io_read_parts              | 1490.0        |
| fuse_remote_io_seeks                   | 77607.0       |
| fuse_remote_io_seeks_after_merged      | 77607.0       |
+----------------------------------------+---------------+

-- Merged IO (set storage_io_min_bytes_for_seek=48)
mysql> select metric, value  from system.metrics where metric like '%remote_io%' order by metric;
+----------------------------------------+---------------+
| metric                                 | value         |
+----------------------------------------+---------------+
| fuse_remote_io_read_bytes              | 15205646411.0 |
| fuse_remote_io_read_bytes_after_merged | 15208201850.0 |
| fuse_remote_io_read_milliseconds       | 446209.0      |
| fuse_remote_io_read_parts              | 1490.0        |
| fuse_remote_io_seeks                   | 77607.0       |
| fuse_remote_io_seeks_after_merged      | 13904.0       |
+----------------------------------------+---------------+

For this query, the total IO time was from 546271.0ms to 446209.0ms(after Merged IO).

Case2:

mysql> select * from hits ignore_result;
Empty set (19.74 sec)
Read 99997497 rows, 75.78 GiB in 19.733 sec., 5.07 million rows/sec., 3.84 GiB/sec.

mysql> select metric, value  from system.metrics where metric like '%remote_io%' order by metric;
+----------------------------------------+---------------+
| metric                                 | value         |
+----------------------------------------+---------------+
| fuse_remote_io_read_bytes              | 15412179252.0 |
| fuse_remote_io_read_bytes_after_merged | 15412179252.0 |
| fuse_remote_io_read_milliseconds       | 375970.0      |
| fuse_remote_io_read_parts              | 751.0         |
| fuse_remote_io_seeks                   | 78855.0       |
| fuse_remote_io_seeks_after_merged      | 78855.0       |
+----------------------------------------+---------------+


mysql> select * from hits ignore_result;
Empty set (19.71 sec)
Read 99997497 rows, 75.78 GiB in 19.701 sec., 5.08 million rows/sec., 3.85 GiB/sec.

mysql> select metric, value  from system.metrics where metric like '%remote_io%' order by metric;
+----------------------------------------+---------------+
| metric                                 | value         |
+----------------------------------------+---------------+
| fuse_remote_io_read_bytes              | 15412179252.0 |
| fuse_remote_io_read_bytes_after_merged | 15414801174.0 |
| fuse_remote_io_read_milliseconds       | 337415.0      |
| fuse_remote_io_read_parts              | 751.0         |
| fuse_remote_io_seeks                   | 78855.0       |
| fuse_remote_io_seeks_after_merged      | 13461.0       |
+----------------------------------------+---------------+

Bench

Nightly-162 vs. This PR

image

Closes #9308

@vercel
Copy link

vercel bot commented Dec 22, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Dec 26, 2022 at 2:48AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Dec 22, 2022
@BohuTANG

This comment was marked as outdated.

@BohuTANG

This comment was marked as outdated.

@BohuTANG
Copy link
Member Author

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Dec 23, 2022

update

✅ Branch has been successfully updated

@BohuTANG
Copy link
Member Author

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Dec 25, 2022

update

✅ Branch has been successfully updated

@BohuTANG BohuTANG marked this pull request as ready for review December 25, 2022 05:07
@BohuTANG BohuTANG marked this pull request as draft December 25, 2022 06:06
@BohuTANG BohuTANG marked this pull request as ready for review December 25, 2022 10:58
@BohuTANG
Copy link
Member Author

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Dec 25, 2022

update

✅ Branch has been successfully updated

@BohuTANG BohuTANG requested a review from zhang2014 December 26, 2022 02:50
@BohuTANG BohuTANG merged commit bb49b0a into databendlabs:main Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance: add pre-fetch for block reader
2 participants