-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ibis): fix the function white list of DuckDB #1129
Conversation
WalkthroughThis pull request updates test expectations in the local file connector by reducing the expected function count and changing the targeted function from "array_length" to "regexp_escape," along with corresponding updates to its description, parameters, and return type. It also expands data type support in core logical planning by modifying the conditions in the Changes
Sequence Diagram(s)sequenceDiagram
participant C as Caller
participant M as map_data_type
C->>M: Call map_data_type("input data type")
M->>M: Check if input starts with "array" or "list"
M->>M: Evaluate additional conditions for new types (e.g., "utinyint", "timestamp with time zone")
M->>C: Return corresponding DataType mapping
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-core/core/src/logical_plan/utils.rs (2)
104-187
: Missing support for DuckDB map type.The PR objectives mentioned that the "map" type is not supported. Consider adding a comment in the code to document this known limitation, potentially with a TODO.
pub fn map_data_type(data_type: &str) -> Result<DataType> { let lower = data_type.to_lowercase(); let lower_data_type = lower.as_str(); - // TODO: try parse nested type by arrow + // TODO: try parse nested type by arrow + // Known limitation: DuckDB "map" type is currently not supported // Currently, we don't care about the element type of the array or struct. // We only care about the array or struct itself.
170-180
: Consider adding support for the "any" type.The PR objectives mentioned future consideration for the "any" type for return types or parameter types. Consider adding a placeholder mapping for the "any" type.
"time" => DataType::Time32(TimeUnit::Nanosecond), // chose the smallest time unit "null" => DataType::Null, + // TODO: Consider proper support for the "any" type in the future + "any" => DataType::Null, // Temporary mapping for the "any" type // Trino Compatible Types "varbinary" => DataType::Binary, // DuckDB Compatible Types
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ibis-server/resources/function_list/duckdb.csv
is excluded by!**/*.csv
📒 Files selected for processing (2)
ibis-server/tests/routers/v3/connector/local_file/test_functions.py
(1 hunks)wren-core/core/src/logical_plan/utils.rs
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: cargo test (win64)
- GitHub Check: cargo test (macos-aarch64)
- GitHub Check: cargo test (macos)
- GitHub Check: ci
- GitHub Check: test
- GitHub Check: cargo check
🔇 Additional comments (7)
ibis-server/tests/routers/v3/connector/local_file/test_functions.py (2)
56-57
: Reduced DuckDB function count confirmed.The comment and assertion correctly reflect the reduction in DuckDB functions from 437 to 429, which aligns with the PR objective of removing duplicate functions.
58-66
:❓ Verification inconclusive
Function test updated properly from array_length to regexp_escape.
The test now verifies the correct function signature for
regexp_escape
instead ofarray_length
, with appropriate parameter and return types. This change matches the PR objective of correcting function return types.
🏁 Script executed:
#!/bin/bash # Verify that the regexp_escape function is properly defined in the DuckDB functions list # Check if the regexp_escape function exists in the DuckDB function list file if [ -f "resources/function_list/duckdb.csv" ]; then grep -E "regexp_escape.*Utf8.*Utf8" resources/function_list/duckdb.csv else echo "DuckDB function list file not found at expected location." # Try to find it elsewhere fd -g "*duckdb.csv" -x grep -E "regexp_escape.*Utf8.*Utf8" {} fiLength of output: 238
Action Needed: Confirm DuckDB function list file location and content
The test update in
ibis-server/tests/routers/v3/connector/local_file/test_functions.py
(lines 58–66) correctly changes the verification fromarray_length
toregexp_escape
with the intended parameter (Utf8
) and return (Utf8
) types. However, the automated check for the function signature in the DuckDB functions file did not find the expected file atresources/function_list/duckdb.csv
.Please manually verify the following:
- That the DuckDB function list file exists in the repository (or has been relocated/renamed).
- That the file contains the correct signature for
regexp_escape
(matching"regexp_escape.*Utf8.*Utf8"
).Once confirmed, please update this review comment accordingly if any changes are needed.
wren-core/core/src/logical_plan/utils.rs (5)
30-30
: Enhanced list type recognition to handle both "array" and "list" types.Adding support for "list" as an equivalent to "array" improves DuckDB compatibility, as DuckDB uses both terms for similar concepts.
110-110
: Expanded data type recognition to include both "array" and "list" prefixes.This change consistently implements list type recognition across the codebase, ensuring that types starting with either "array" or "list" are properly handled.
120-131
: Added unsigned integer type mappings for DuckDB compatibility.These additions properly support unsigned integer types in DuckDB, improving type compatibility for the connector.
145-153
: Expanded timestamp with timezone type handling.The implementation now correctly handles various ways timestamp with timezone types can be expressed in DuckDB, including "timestamp with time zone" (with spaces) and "time with time zone". The comment properly explains why "time with time zone" is mapped to timestamp.
174-179
: Added additional DuckDB-specific type mappings.The implementation adds proper support for DuckDB-specific types:
- blob → Binary
- hugeint → Int64 (with appropriate comment about lack of direct support)
- uhugeint → UInt64 (with appropriate comment about lack of direct support)
- bit → Boolean (with appropriate comment about lack of direct support)
- timestamp_ns → Timestamp(TimeUnit::Nanosecond, None)
These mappings improve compatibility with DuckDB data sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal, it is a heavy job.
closed #1126 |
Description
Known issues
map
type isn't supported now.any
for the return type or parameter type.Summary by CodeRabbit