performance: add pre-fetch for block reader #9308

BohuTANG · 2022-12-20T11:06:20Z

Summary

For sync read:
https://github.com/datafuselabs/databend/blob/523e2190c275d481c707f8ee35972fc7290cba38/src/query/storages/fuse/fuse/src/io/read/block_reader.rs#L380-L390

https://github.com/datafuselabs/databend/blob/523e2190c275d481c707f8ee35972fc7290cba38/src/query/storages/fuse/fuse/src/io/read/block_reader.rs#L389-L389

If we have many indices may be a waste because their offset may be adjacent (note: not sequentially connected, there may be some gaps, like the gap <1KB), converts small fragmented reads into one large read, so we can merge the reading them all in one:

let (merge_read_offset, merge_read_length) = ...;
let result = Self::sync_read_column(op.object(&location), merge_read_offset, merge_read_length); 
-- Strip out (offset, length) data from result

The text was updated successfully, but these errors were encountered:

BohuTANG · 2022-12-20T11:06:35Z

cc @sundy-li @RinChanNOWWW

sundy-li · 2022-12-20T11:15:00Z

sync_read_column will respect fs cache if they are adjacent.

BohuTANG · 2022-12-21T09:50:59Z

async read from the second object store, like s3, the pre-fetch will be helpful.

The query:

SELECT * FROM hits  WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;

I print all read costs for each column of a partition:

Read 30086bytes took: 114 ms. It's almost the same as reading 890095 bytes took: 111 ms. It's network latency bounded, not io-bounded.

FYI @Xuanwo

BohuTANG added the C-performance Category: Performance label Dec 20, 2022

BohuTANG mentioned this issue Dec 22, 2022

feat: try to improve object storage io read #9335

Merged

BohuTANG closed this as completed in #9335 Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance: add pre-fetch for block reader #9308

performance: add pre-fetch for block reader #9308

BohuTANG commented Dec 20, 2022 •

edited

Loading

BohuTANG commented Dec 20, 2022

sundy-li commented Dec 20, 2022

BohuTANG commented Dec 21, 2022

performance: add pre-fetch for block reader #9308

performance: add pre-fetch for block reader #9308

Comments

BohuTANG commented Dec 20, 2022 • edited Loading

BohuTANG commented Dec 20, 2022

sundy-li commented Dec 20, 2022

BohuTANG commented Dec 21, 2022

BohuTANG commented Dec 20, 2022 •

edited

Loading