Skip to content

Convert mismatched types of arg/sequence when applying optimization of ShuffleByKeys/PartitionsByKeys to empty sequence #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

UgnineSirdis
Copy link
Collaborator

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Additional information

...

@UgnineSirdis UgnineSirdis requested a review from a team as a code owner February 4, 2024 14:50
Copy link

github-actions bot commented Feb 4, 2024

2024-02-04 14:53:15 UTC Pre-commit check for dda612b has started.
2024-02-04 14:53:18 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-04 14:56:26 UTC Build successful.
2024-02-04 14:56:43 UTC Tests are running...
🔴 2024-02-04 16:32:14 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
58861 49539 0 1 9261 60

Copy link

github-actions bot commented Feb 4, 2024

2024-02-04 14:55:48 UTC Pre-commit check for dda612b has started.
2024-02-04 14:55:51 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-04 14:58:50 UTC Build successful.
2024-02-04 14:59:04 UTC Tests are running...
🔴 2024-02-04 16:39:43 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14537 14380 0 43 67 47

Copy link
Member

@rvu1024 rvu1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Приведи пожалуйста пример, когда такое может получиться. Сейчас по типизации ForwardList вообще запрещен над списками, а ты его специально зачем-то включаешь. Выглядит, что нужно чинить другое место, где он возникает

@UgnineSirdis
Copy link
Collaborator Author

Приведи пожалуйста пример, когда такое может получиться. Сейчас по типизации ForwardList вообще запрещен над списками, а ты его специально зачем-то включаешь. Выглядит, что нужно чинить другое место, где он возникает

Привет! Я тоже об этом думал.

Я исследовал работу запроса, который описан в добавленном тесте в этом ревью.

В какой-то момент S3-провайдер понимает, что в данном бакете данных нет и вставляет вместо чтения из S3 пустой список в качестве входных данных (с Cons!):

(
(let $1 (StructType '('"l_extendedprice" (AsOptionalType (DataType 'Double))) '('"l_discount" (AsOptionalType (DataType 'Double))) '('"l_partkey" (AsOptionalType (DataType 'Int64))) '('"l_shipdate" (AsOptionalType (DataType 'Date)))))
(let $2 '('"l_extendedprice" '"l_discount" '"l_partkey" '"l_shipdate"))
(let $3 (Left! (Cons! world (AssumeColumnOrder (List (ListType $1)) $2))))
(let $4 (StructType '('"p_partkey" (AsOptionalType (DataType 'Int64))) '('"p_type" (AsOptionalType (DataType 'String)))))
(let $5 '('"p_partkey" '"p_type"))
(let $6 (DataSink 'result))
(let $7 '('Inner '"l" '"p" '('"l" '"l_partkey") '('"p" '"p_partkey") '()))
(let $8 (EquiJoin '((RemoveSystemMembers (Right! (Cons! world (AssumeColumnOrder (List (ListType $1)) $2)))) '"l") '((RemoveSystemMembers (Right! (Cons! $3 (AssumeColumnOrder (List (ListType $4)) $5)))) '"p") $7 '()))
(let $9 (Filter $8 (lambda '($16) (block '(
  (let $17 (SqlColumn $16 '"l_shipdate" '"l"))
  (let $18 (SqlColumn $16 '"l_shipdate" '"l"))
  (return (Coalesce (And (>= (SafeCast $17 (DataType 'Timestamp)) (Date '"8978")) (< (SafeCast $18 (DataType 'Timestamp)) (+MayWarn (Date '"8978") (Interval '"2678400000000")))) (Bool 'false)))
)))))
(let $10 (AggApply 'sum (ListItemType (TypeOf $9)) (lambda '($19) (block '(
  (let $20 (SqlColumn $19 '"p_type" '"p"))
  (let $21 (SqlColumn $19 '"l_extendedprice" '"l"))
  (let $22 (SqlColumn $19 '"l_discount" '"l"))
  (return (PersistableRepr (If (Coalesce (StartsWith $20 (String '"PROMO")) (Bool 'false)) (*MayWarn $21 (-MayWarn (Int32 '"1") $22)) (Int32 '"0"))))
)))))
(let $11 (AggApply 'sum (ListItemType (TypeOf $9)) (lambda '($23) (block '(
  (let $24 (SqlColumn $23 '"l_extendedprice" '"l"))
  (let $25 (SqlColumn $23 '"l_discount" '"l"))
  (return (PersistableRepr (*MayWarn $24 (-MayWarn (Int32 '"1") $25))))
)))))
(let $12 (Aggregate $9 '() '('('Sum0 $10) '('Sum1 $11)) '()))
(let $13 (AssumeColumnOrderPartial $12 '()))
(let $14 '('('type) '('autoref) '('columns '('"promo_revenue"))))
(let $15 (ResWrite! (Left! (Cons! $3 (AssumeColumnOrder (List (ListType $4)) $5))) $6 (Key) (RemovePrefixMembers (PersistableRepr (SqlProject $13 '((SqlProjectItem (TypeOf $13) '"promo_revenue" (lambda '($26) (/MayWarn (*MayWarn (Double '"100.00") (Member $26 'Sum0)) (Member $26 'Sum1))))))) '('_yql_sys_)) $14))
(return (Commit! (Commit! $15 $6) (DataSink '"kikimr" '"db") '('('"mode" '"flush"))))
)

Со временем это переписывается в такой запрос:

(
(let $1 (DataType 'Double))
(let $2 (OptionalType $1))
(let $3 '('"l.l_discount" $2))
(let $4 '"l.l_extendedprice")
(let $5 '($4 $2))
(let $6 (StructType $3 $5 '('"p.p_type" (OptionalType (DataType 'String)))))
(let $7 (Int32 '"1"))
(let $8 (AggApply 'sum $6 (lambda '($11) (If (Coalesce (StartsWith (Member $11 '"p.p_type") (String '"PROMO")) (Bool 'false)) (* (Member $11 $4) (- $7 (Member $11 '"l.l_discount"))) (Just (Convert (Int32 '0) $1))))))
(let $9 (AggApply 'sum (StructType $3 $5) (lambda '($12) (* (Member $12 $4) (- $7 (Member $12 '"l.l_discount"))))))
(let $10 (Aggregate (List (ListType $6)) '() '('('Sum0 $8) '('Sum1 $9)) '()))
(return '('('((FlatMap $10 (lambda '($13) (Just (AsStruct '('"promo_revenue" (/ (* (Double '"100.00") (Member $13 'Sum0)) (Member $13 'Sum1))))))) '('"promo_revenue"))) '()))
)

Затем:

(
(let $1 (OptionalType (DataType 'Double)))
(let $2 '('_yql_agg_0 $1))
(let $3 '('_yql_agg_1 $1))
(let $4 (AggregationTraits (StructType $2) (lambda '($7) (Member $7 '_yql_agg_0)) (lambda '($8 $9) (Void)) (lambda '($10) $10) (lambda '($11) $11) (lambda '($12 $13) (AggrAdd $12 $13)) (lambda '($14) $14) (Null)))
(let $5 (AggregationTraits (StructType $3) (lambda '($15) (Member $15 '_yql_agg_1)) (lambda '($16 $17) (Void)) (lambda '($18) $18) (lambda '($19) $19) (lambda '($20 $21) (AggrAdd $20 $21)) (lambda '($22) $22) (Null)))
(let $6 (AggregateMergeFinalize (List (ListType (StructType $2 $3))) '() '('('Sum0 $4) '('Sum1 $5)) '()))
(return '('('((FlatMap $6 (lambda '($23) (Just (AsStruct '('"promo_revenue" (/ (* (Double '"100.00") (Member $23 'Sum0)) (Member $23 'Sum1))))))) '('"promo_revenue"))) '()))
)

Затем мы заходим сюда:

YQL_CLOG(DEBUG, Core) << "Expand " << Node->Content();

Получаем AST:

(
(let $1 (OptionalType (DataType 'Double)))
(let $2 (lambda '($4) (Uint32 '0)))
(let $3 (ToOptional (ShuffleByKeys (List (ListType (StructType '('_yql_agg_0 $1) '('_yql_agg_1 $1)))) $2 (lambda '($5) (FinalizeByKey $5 (lambda '($6) (Just $6)) $2 (lambda '($7 $8) (AsStruct '('Sum0 (Member $8 '_yql_agg_0)) '('Sum1 (Member $8 '_yql_agg_1)))) (lambda '($9 $10 $11) (AsStruct '('Sum0 (AggrAdd (Member $10 '_yql_agg_0) (Member $11 'Sum0))) '('Sum1 (AggrAdd (Member $10 '_yql_agg_1) (Member $11 'Sum1))))) (lambda '($12 $13) (AsStruct '('Sum0 (Member $13 'Sum0)) '('Sum1 (Member $13 'Sum1)))))))))
(return '('('((FlatMap (AsList (AsStruct '('"Sum0" (Coalesce (Member $3 '"Sum0") (Null))) '('"Sum1" (Coalesce (Member $3 '"Sum1") (Null))))) (lambda '($14) (Just (AsStruct '('"promo_revenue" (/ (* (Double '"100.00") (Member $14 'Sum0)) (Member $14 'Sum1))))))) '('"promo_revenue"))) '()))
)

Затем, в случае, если у нас тип лямбды - стрим, мы добавляем ForwardList вот этим кодом:

lambdaResult = ctx.NewCallable(lambdaResult->Pos(), "ForwardList", { lambdaResult });

Получаем AST:

(
(let $1 (OptionalType (DataType 'Double)))
(let $2 (ToOptional (ForwardList (FinalizeByKey (List (ListType (StructType '('_yql_agg_0 $1) '('_yql_agg_1 $1)))) (lambda '($3) (Just $3)) (lambda '($4) (Uint32 '0)) (lambda '($5 $6) (AsStruct '('Sum0 (Member $6 '_yql_agg_0)) '('Sum1 (Member $6 '_yql_agg_1)))) (lambda '($7 $8 $9) (AsStruct '('Sum0 (AggrAdd (Member $8 '_yql_agg_0) (Member $9 'Sum0))) '('Sum1 (AggrAdd (Member $8 '_yql_agg_1) (Member $9 'Sum1))))) (lambda '($10 $11) $11)))))
(return '('('((FlatMap (AsList (AsStruct '('Sum0 (Member $2 'Sum0)) '('Sum1 (Member $2 'Sum1)))) (lambda '($12) (Just (AsStruct '('"promo_revenue" (/ (* (Double '"100.00") (Member $12 'Sum0)) (Member $12 'Sum1))))))) '('"promo_revenue"))) '()))
)

Тут есть интересный callable, который меняет тип: FinalizeByKey
Будучи в лямбде и принимая аргумент типа stream (это параметр лямбды), он выдаёт аргумент типа stream (поэтому мы вставляем ForwardList). Но затем при раскрытии лямбды и подстановке ему пустого списка в качестве аргумента он становится списком. Его тип определяется вот этим кодом:

IGraphTransformer::TStatus CombineByKeyWrapper(const TExprNode::TPtr& input, TExprNode::TPtr& output, TContext& ctx) {

После этого как раз сразу срабатывает проверка:

if (!EnsureNewSeqType<false, false>(input->Head(), ctx.Expr, &itemType)) {

В целом да, проблема скорее всего в FinalizeByKey, но, с одной стороны, он может by design быть как списком, так и стримом (это видно вот здесь, например:

TExprNode::TPtr ExpandFinalizeByKey(const TExprNode::TPtr& node, TExprContext& ctx) {
), с другой - по идее нехорошо, что он меняет этот тип на ходу.

Может быть, есть какое-то третье место, где настоящая ошибка? :)

@rvu1024
Copy link
Member

rvu1024 commented Feb 6, 2024

В это месте баг core-оптимизатора. Он должен приводить KeepConstraints(node->HeadPtr()...) к типу аргумента лямбды (через Iterator/ToFlow/..)

@UgnineSirdis UgnineSirdis changed the title Remove ForwardList in case if applied to list Convert mismatched types of arg/sequence when applying optimization of ShuffleByKeys/PartitionsByKeys to empty sequence Feb 7, 2024
Copy link

github-actions bot commented Feb 7, 2024

2024-02-07 07:50:27 UTC Pre-commit check for da72f11 has started.
2024-02-07 07:50:30 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-07 07:54:04 UTC Build successful.
2024-02-07 07:54:22 UTC Tests are running...
🔴 2024-02-07 09:29:18 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14559 14457 0 8 52 42

Copy link

github-actions bot commented Feb 7, 2024

2024-02-07 07:50:40 UTC Pre-commit check for da72f11 has started.
2024-02-07 07:50:42 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-07 07:54:41 UTC Build successful.
2024-02-07 07:54:54 UTC Tests are running...
🔴 2024-02-07 09:40:25 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
58935 49619 0 1 9278 37

Copy link

github-actions bot commented Feb 7, 2024

2024-02-07 16:46:01 UTC Pre-commit check for 7f4190a has started.
2024-02-07 16:46:04 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-07 16:49:48 UTC Build successful.
2024-02-07 16:50:00 UTC Tests are running...
🔴 2024-02-07 18:25:20 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14584 14462 0 11 70 41

Copy link

github-actions bot commented Feb 8, 2024

2024-02-08 11:05:56 UTC Pre-commit check for baf875f has started.
2024-02-08 11:05:59 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-08 11:10:27 UTC Build successful.
2024-02-08 11:10:41 UTC Tests are running...
🟢 2024-02-08 12:53:45 UTC Tests successful.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67245 56379 0 0 10820 46

Copy link

github-actions bot commented Feb 8, 2024

2024-02-08 11:08:13 UTC Pre-commit check for baf875f has started.
2024-02-08 11:08:16 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-08 11:13:13 UTC Build successful.
2024-02-08 11:13:26 UTC Tests are running...
🔴 2024-02-08 12:52:59 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14589 14475 0 16 73 25

@serbel324 serbel324 mentioned this pull request Feb 13, 2024
@vitstn vitstn mentioned this pull request Feb 16, 2024
@UgnineSirdis UgnineSirdis deleted the query-with-no-data-in-s3 branch February 22, 2024 09:39
EgorkaZ pushed a commit to EgorkaZ/ydb that referenced this pull request Apr 10, 2024
pavelvelikhov pushed a commit to pavelvelikhov/ydb that referenced this pull request May 21, 2024
pavelvelikhov pushed a commit to pavelvelikhov/ydb that referenced this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants