-
Notifications
You must be signed in to change notification settings - Fork 666
Support for parquet file with type inferring #7734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for parquet file with type inferring #7734
Conversation
⚪ ⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪ ⚪
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
[actorSystem, selfId = SelfId(), request = std::move(request)](NYql::IHTTPGateway::TResult&& result) mutable { | ||
actorSystem->Send(selfId, new TEvS3DownloadResponse(std::move(request), std::move(result))); | ||
}, {}, RetryPolicy_); | ||
return std::move(headers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а почему тут NRVO не сработает?
@@ -333,7 +333,7 @@ struct TObjectStorageExternalSource : public IExternalSource { | |||
} | |||
for (const auto& entry : entries.Objects) { | |||
if (entry.Size > 0) { | |||
return entry.Path; | |||
return entry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А как оно раньше работало. Почему на entry заменилось?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
раньше мы скачивали первые 10МБ файла -> размер файла нам был не нужен.
теперь для паркета нам нужно скачать последние N -> нужно знать размер файла, тк S3Fetcher принимает на вход (начало, конец) участка памяти
соотв я добавил в TEvInferFileSchema размер файла
return; | ||
} | ||
case EFileFormat::Undefined: | ||
Y_ABORT("Invalid format should be unreachable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABORT плохо в этом месте, можешь ошибку возвращать по аналогии с default? Мы легко и удобно можешь донести эту проблему выше. Еще case нужно вынести выше default
@@ -118,12 +119,24 @@ struct TEvArrowFile : public NActors::TEventLocal<TEvArrowFile, EvArrowFile> { | |||
TString Path; | |||
}; | |||
|
|||
struct TEvArrowSchema : public NActors::TEventLocal<TEvArrowSchema, EvArrowSchema> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Может назвать это TEvInferredArrowSchema
по аналогии с TEvInferredFileSchema
?
} | ||
|
||
void HandleFileError(TEvFileError::TPtr& ev, const NActors::TActorContext& ctx) { | ||
Cout << "TArrowInferencinator::HandleFileError" << Endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cout лишний. Либо в логи переносить если это нужно
futureSchema.Apply([actorSystem, sender, request](NThreading::TFuture<NYql::IArrowReader::TSchemaResponse> response) { | ||
if (response.HasException()) { | ||
try { | ||
response.TryRethrow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А TryRethrow что делает если HasException = false?
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪ ⚪
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
ydb/core/external_sources/object_storage/inference/arrow_fetcher.cpp
Outdated
Show resolved
Hide resolved
ydb/core/external_sources/object_storage/inference/arrow_fetcher.cpp
Outdated
Show resolved
Hide resolved
ydb/core/external_sources/object_storage/inference/arrow_fetcher.cpp
Outdated
Show resolved
Hide resolved
⚪ ⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
||
std::shared_ptr<arrow::io::RandomAccessFile> BuildParquetFileFromMetadata(const TString& data, const TRequest& request, const NActors::TActorContext& ctx) { | ||
auto arrowData = std::make_shared<arrow::Buffer>(nullptr, 0); | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а зачем эти скобочки?
if (buildRes.ok()) { | ||
buildRes = builder.Finish(&arrowData); | ||
} | ||
if (!buildRes.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это условие вверх можно перенести, а ниже ок часть и убрать if (buildRes.ok()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это условие проверяет и builder.Append и builder.Finish, предлагаешь их по-отдельности проверять?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Скорее да, еще бы разные ошибки написать. Чтобы понимать какое конкретно место сломалось, finish или append
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪ ⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
a6a8b05
into
ydb-platform:main
Changelog entry
https://st.yandex-team.ru/YQ-2830
Changelog category