-
Notifications
You must be signed in to change notification settings - Fork 770
Feature: Support Arrow Flight SQL protocol #9832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's a community feature now, someone interested in this could take this task. |
Hi @sundy-li , can you please share some general ideas on how to get start on this? |
Paper related to this feature: https://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf |
Thanks for the info! I'll take a closer look. |
Interested in this feature also. Any hands-on work is in planning yet I could help. |
We already have flight protocol that could be used to communicate with other query nodes in cluster. But it's for internal usage. We can have similar handler based on flight. |
c++ implementation example, flight on duckdb/sqlite. https://github.com/voltrondata/flight-duckdb-example |
Thanks for providing those. Just took a closer look at protocol differences between Arrow flights and it seems the Databend flight rpc service would be a really strong start! Just 2 follow-ups during the investigations:
|
The But within this issue, we are going to support |
Sorry just didn't get it because the RPC listener is already on. https://github.com/datafuselabs/databend/blob/main/src/binaries/query/main.rs#L196-L203 So to finish this issue, it just includes:
|
I am working on it. if all goes well, the first version that can work with JDBC will be available by next weekend. |
Awesome. Curious why would anyone want to use JDBC with flight. The whole point of flight server is we can get arrow data directly in columnar format. JDBC makes again row oriented data. |
JDBC is over flight SQL. If we support flight SQL, we can seamlessly connect to the jdbc ecosystem (many third-party tools use jdbc to connect). https://www.dremio.com/blog/jdbc-driver-for-arrow-flight-sql/ But we will not use JDBC inside databend. |
by the way, do you know any client-side tools that use flight-SQL and columnar format directly? @kesavkolla |
I must apologize to @sundy-li for this because there are a lot of misunderstandings before I started to investigate Arrow Flight SQL. There are some deep dives afterward I could share.
Let me know if I could still be of help @youngsofun |
done in #10732 |
Summary
Currently databend support MySQL protocol, as an alternative to this databend also should support Arrow Flight SQL protocol.
databend is dealing with usecases of data warehouse/lakehouse where the data volumes are high. When a client is interacting with databend to query for data; it would be performant to support arrow data format. Typically lakehouse stores data in parquet file with MySQL protocol databend has to do deserialization from parquet to arrow and then back to MySQL data types. Again on the caller end people use data frames or MySQL result iterators this also requires serialization of types. With Arrow Flight SQL all of these serialization costs can be avoided. databend will convert parquet to arrow and does it's query operations then send arrow data directly as result. Clients can take that arrow data and can even directly send the arrow data to all the way visualization layers.
The text was updated successfully, but these errors were encountered: