feat: Add HDFS support #5245

Xuanwo · 2022-05-09T04:07:51Z

Signed-off-by: Xuanwo [email protected]

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This PR will allow databend-query to use hdfs as storage backend.

Part of #5215

Changelog

New Feature

Signed-off-by: Xuanwo <[email protected]>

vercel · 2022-05-09T04:07:56Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Updated
databend	⬜️ Ignored (Inspect)		May 9, 2022 at 4:25AM (UTC)

mergify · 2022-05-09T04:08:18Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

Xuanwo · 2022-05-09T04:09:16Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

BohuTANG · 2022-05-09T04:10:55Z

This PR only makes the HDFS as a normal storage backend like AWS S3, it's not related to @dantengsky work on hive?

Xuanwo · 2022-05-09T04:12:03Z

From I understand, this PR makes the HDFS as a normal storage backend like AWS S3, it's not related to @dantengsky work on hive?

Yes. Hive's integration should happen in another PR.

query/src/configs/config_storage.rs

Signed-off-by: Xuanwo <[email protected]>

BohuTANG · 2022-05-09T05:15:26Z

query/src/configs/config_storage.rs

+
+    // hdfs storage backend config
+    #[clap(flatten)]
+    pub hdfs: HdfsConfig,


If we don't add the HdfsConfig item to *.toml, the config deserialize looks will crash?

This is another problem, we don't want unused config items explicitly:

[storage] # fs|s3 type = "s3" [storage.fs] -- this config [storage.s3] bucket = "databend" endpoint_url = "https://s3.amazonaws.com" access_key_id = "<your-key-id>" secret_access_key = "<your-access-key>" [storage.azblob] -- this config

Is it possible to configure the item only we used, like [storage.s3] here?

If we don't add the HdfsConfig item to *.toml, the config deserialize looks will crash?

If HdfsConfig is not added in *.toml, we will use the default value instead.

I tested this behavior locally: query is able to start without adding hdfs-related staff.

That's great, I am out, thank you.
I will remove all the unused config items from the documents.

dantengsky · 2022-05-09T05:33:49Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

Yeah, currently, hive PR's ut/it is not integrated with the github workflows yet. Just a local hadoop + hive cluster for the testings. there are no build-time dependencies on the JDK/jar files. A docker image seems to be able to cover it.

But for this PR, a docker image may not be enough. hope I get it right:

to enable feature storage-hdfs at compiling
we need a JDK(for libjvm.so and some header files of jvm)
to enable feature storage-hdfs at runtime (or ut/it)
an HDFS cluster (for the specified version) is needed

Xuanwo · 2022-05-09T05:47:36Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

Yeah, currently, hive PR's ut/it is not integrated with the github workflows yet. Just a local hadoop + hive cluster for the testings. there are no build-time dependencies on the JDK/jar files. A docker image seems to be able to cover it.

But for this PR, a docker image may not be enough. hope I get it right:

to enable feature storage-hdfs at compiling
we need a JDK(for libjvm.so and some header files of jvm)

to enable feature storage-hdfs at runtime (or ut/it)
an HDFS cluster (for the specified version) is needed

Thanks for the advice! Maybe we can make databend-query compilable in this PR and test it in the next PR.

BohuTANG

👍

feat: Add HDFS support

6b0469e

Signed-off-by: Xuanwo <[email protected]>

Xuanwo requested a review from BohuTANG as a code owner May 9, 2022 04:07

databend-bot added the need-review label May 9, 2022

mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 9, 2022

BohuTANG requested a review from dantengsky May 9, 2022 04:09

BohuTANG reviewed May 9, 2022

View reviewed changes

query/src/configs/config_storage.rs Show resolved Hide resolved

Format toml

68004a3

Signed-off-by: Xuanwo <[email protected]>

BohuTANG reviewed May 9, 2022

View reviewed changes

BohuTANG approved these changes May 9, 2022

View reviewed changes

BohuTANG merged commit 4fd1b64 into databendlabs:main May 9, 2022

Xuanwo deleted the hdfs branch May 9, 2022 05:57

Xuanwo mentioned this pull request May 9, 2022

Tracking issues of adopting features added in OpenDAL v0.6 #5215

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add HDFS support #5245

feat: Add HDFS support #5245

Xuanwo commented May 9, 2022

vercel bot commented May 9, 2022 •

edited

Loading

mergify bot commented May 9, 2022

Xuanwo commented May 9, 2022

BohuTANG commented May 9, 2022 •

edited

Loading

Xuanwo commented May 9, 2022

BohuTANG May 9, 2022

BohuTANG May 9, 2022 •

edited

Loading

Xuanwo May 9, 2022

BohuTANG May 9, 2022

dantengsky commented May 9, 2022

Xuanwo commented May 9, 2022

BohuTANG left a comment

feat: Add HDFS support #5245

feat: Add HDFS support #5245

Conversation

Xuanwo commented May 9, 2022

Summary

Changelog

vercel bot commented May 9, 2022 • edited Loading

mergify bot commented May 9, 2022

Xuanwo commented May 9, 2022

BohuTANG commented May 9, 2022 • edited Loading

Xuanwo commented May 9, 2022

BohuTANG May 9, 2022

Choose a reason for hiding this comment

BohuTANG May 9, 2022 • edited Loading

Choose a reason for hiding this comment

Xuanwo May 9, 2022

Choose a reason for hiding this comment

BohuTANG May 9, 2022

Choose a reason for hiding this comment

dantengsky commented May 9, 2022

Xuanwo commented May 9, 2022

BohuTANG left a comment

Choose a reason for hiding this comment

vercel bot commented May 9, 2022 •

edited

Loading

BohuTANG commented May 9, 2022 •

edited

Loading

BohuTANG May 9, 2022 •

edited

Loading