-
Notifications
You must be signed in to change notification settings - Fork 770
feat: Add HDFS support #5245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add HDFS support #5245
Conversation
Signed-off-by: Xuanwo <[email protected]>
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly? |
This PR only makes the HDFS as a normal storage backend like AWS S3, it's not related to @dantengsky work on hive? |
Yes. Hive's integration should happen in another PR. |
Signed-off-by: Xuanwo <[email protected]>
|
||
// hdfs storage backend config | ||
#[clap(flatten)] | ||
pub hdfs: HdfsConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't add the HdfsConfig
item to *.toml, the config deserialize looks will crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another problem, we don't want unused config items explicitly:
[storage]
# fs|s3
type = "s3"
[storage.fs] -- this config
[storage.s3]
bucket = "databend"
endpoint_url = "https://s3.amazonaws.com"
access_key_id = "<your-key-id>"
secret_access_key = "<your-access-key>"
[storage.azblob] -- this config
Is it possible to configure the item only we used, like [storage.s3]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't add the
HdfsConfig
item to *.toml, the config deserialize looks will crash?
If HdfsConfig
is not added in *.toml
, we will use the default value instead.
I tested this behavior locally: query
is able to start without adding hdfs-related staff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, I am out, thank you.
I will remove all the unused config items from the documents.
Yeah, currently, hive PR's ut/it is not integrated with the github workflows yet. Just a local hadoop + hive cluster for the testings. there are no build-time dependencies on the JDK/jar files. A docker image seems to be able to cover it. But for this PR, a docker image may not be enough. hope I get it right:
|
Thanks for the advice! Maybe we can make databend-query compilable in this PR and test it in the next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: Xuanwo [email protected]
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
This PR will allow databend-query to use hdfs as storage backend.
Part of #5215
Changelog