-
Notifications
You must be signed in to change notification settings - Fork 770
feat: save copy into table stage file meta, avoid duplicate copy file #7531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a bit complex that mixed with meta, catalog and query.
Can you give an RFC first so we can understand how and why we address the issue in this way?
|
||
if let Some(file_info) = resp.file_info.get(file) { | ||
// No need to copy the file again only if md5 match. | ||
if file_info.md5 == stage_file.md5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the following things:
- name
- length
- last modified
- etag (use etag instead of md5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the location is https(not s3), the etag is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the location is https(not s3), the etag is?
ETag in OpenDAL is the same ETag
defined in HTTP Standard: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag
They share the same semantics, which is reliable in detecting content changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type StageFile
, which save stage file meta info, has no etag
field:
#[derive(Default, Clone)]
pub struct StageFile {
pub path: String,
pub size: u64,
pub md5: Option<String>,
pub last_modified: DateTime<Utc>,
pub creator: Option<UserIdentity>,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not finish reading all of the code.
The first of my advices is to keep UpsertKVReq::udpate()
as it is.
And use UpsertKVReq::udpate().with(KVMeta {expire_at: Some(now + 2)}
to update the expire time if needed.
This way it does not have to modify all the usages of update()
.
let cur_db = mt.get_database(Self::req_get_db(tenant, db_name)).await?; | ||
assert!(old_db.ident.seq < cur_db.ident.seq); | ||
assert!(res.table_id >= 1, "table id >= 1"); | ||
let tb_id = res.table_id; | ||
|
||
let got = mt.get_table((tenant, db_name, tbl_name).into()).await?; | ||
let seq = got.ident.seq; | ||
|
||
let ident = TableIdent::new(tb_id, seq); | ||
|
||
let want = TableInfo { | ||
ident: ident.clone(), | ||
desc: format!("'{}'.'{}'.'{}'", tenant, db_name, tbl_name), | ||
name: tbl_name.into(), | ||
meta: table_meta(created_on), | ||
}; | ||
assert_meta_eq_without_updated!(want, got.as_ref().clone(), "get created table"); | ||
ident |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not have to test create-table. Creating a table is already guaranteed to work as expected in other cases.
@drmingdrmer @BohuTANG @Xuanwo I split the original issue into sub-issues, cause I think this pr is too large, including meta and query changes, and need some help of @Xuanwo to add the etag field of |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
TableStageFileInfo
in meta to save copy into table stage file info, it will be expired after 64 days.TableStageFileInfo.md5
.Fixes #6338