Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ibis): added primary key management on postgres and updated mssql and mysql to manage composite keys #1106

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

yalexbadan
Copy link

@yalexbadan yalexbadan commented Mar 25, 2025

In this PR I added development proposal related to issue 1105 (only about PostgreSql and composite key management).

Further investigating on wren-ui, I fear that the same fix should be popagated to table ER diagram page, but I don't have enought competence in font-end technologies to determine which classes should be updated.
I suppose that at least in the types.ts, primaryKey?: InputMaybe<Scalars['String']> fields should become primaryKey?: InputMaybe<Array<Scalars['String']>>, and in ModelForm.tsx, the selection should accempt multiple fields, but I wasn't able to figure out other updates.

Summary by CodeRabbit

  • New Features

    • Enhanced table metadata functionality to support composite primary keys, allowing for multiple key columns.
    • Improved PostgreSQL metadata retrieval with the inclusion of constraint type details for more comprehensive table structure insights.
    • Updated primary key representation in models and tests to utilize a list format for increased flexibility across various databases.
  • Bug Fixes

    • Adjusted handling of primary key attributes across multiple database metadata classes to ensure consistency.

Copy link

coderabbitai bot commented Mar 25, 2025

Walkthrough

This pull request updates the handling of primary keys in various metadata modules. The Table.primaryKey attribute type is changed from a single string (or nullable string) to a list of strings (or nullable list) to support composite keys. Corresponding changes in the get_table_list methods for MSSQL, MySQL, Oracle, and Postgres now initialize primaryKey as an empty list and append keys instead of replacing the value. The Postgres module also enhances its SQL query to retrieve constraint type information.

Changes

File(s) Change Summary
ibis-server/.../metadata/dto.py Updated Table.primaryKey type from `str
ibis-server/.../metadata/{mssql.py, mysql.py, oracle.py, clickhouse.py, trino.py} Modified get_table_list methods to initialize primaryKey as an empty list and append primary key columns instead of using a single string. (Oracle also received an inline comment for future improvements.)
ibis-server/.../metadata/postgres.py Enhanced SQL query with additional LEFT JOINs to fetch constraint types and updated primary key handling to support multiple keys.
ibis-server/tests/routers/v2/connector/{test_clickhouse.py, test_trino.py} Changed primaryKey representation in manifest dictionaries from a single string to a list format for Orders and Customer models.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant M as Metadata Service
    participant DB as Database

    C->>M: Call get_table_list()
    M->>DB: Execute SQL query (joins for constraint types in Postgres)
    DB-->>M: Return table rows
    loop For each row
        M->>M: Check if column qualifies as primary key
        alt Is primary key
            M->>M: Append column name to primaryKey list
        end
    end
    M-->>C: Return list of Table objects with primaryKey as list
Loading

Possibly related PRs

  • feat(ibis): introduce S3 File connector #1038: The changes in the main PR, which modify the primaryKey attribute in the Table class to support multiple primary keys, are related to the changes in the retrieved PR, which also involve handling primary keys in a list format across various data sources, including S3. Both PRs focus on enhancing the flexibility of primary key representations in their respective contexts.
  • feat(ibis): introduce Local file connector #1029: The changes in the main PR, which modify the primaryKey attribute in the Table class to support multiple primary keys, are directly related to the changes in the retrieved PR, where the handling of primary keys is also updated to accommodate a list format in various metadata classes.

Suggested labels

core

Suggested reviewers

  • onlyjackfrost
  • wwwy3y3

Poem

I'm a happy rabbit hopping with glee,
Keys once single now multiply like a tree.
Lines of code transformed in a whimsical spree,
Composite keys dance in a joyful decree.
Cheers from my burrow—coding magic set free! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Mar 25, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ibis-server/app/model/metadata/postgres.py (1)

90-93: Good implementation for handling composite primary keys.

The code now correctly appends each primary key column to the primaryKey list, allowing for proper representation of composite keys. This implementation addresses the core requirement of issue #1105.

Consider adding a comment explaining that this supports composite keys, for future maintainers:

            if row["constraint_type"] == 'PRIMARY KEY':
+               # Support for composite primary keys - collect all PK columns
                unique_tables[schema_table].primaryKey.append(row["column_name"])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e55e8f5 and e159f5a.

📒 Files selected for processing (5)
  • ibis-server/app/model/metadata/dto.py (1 hunks)
  • ibis-server/app/model/metadata/mssql.py (2 hunks)
  • ibis-server/app/model/metadata/mysql.py (2 hunks)
  • ibis-server/app/model/metadata/oracle.py (2 hunks)
  • ibis-server/app/model/metadata/postgres.py (4 hunks)
🔇 Additional comments (9)
ibis-server/app/model/metadata/dto.py (1)

79-79: Type change to support composite primary keys.

The change from a single string to a list of strings enables the representation of composite primary keys, which aligns with the PR objective to address composite key management in PostgreSQL mentioned in the PR description.

This change is consistent with the database schema modeling best practices where tables can have multiple columns forming a primary key.

ibis-server/app/model/metadata/mysql.py (2)

59-59: Initialize primaryKey as empty list to support composite keys.

Changed from an empty string to an empty list, consistent with the type change in dto.py.


74-74: Append to primaryKey list instead of replacing value.

This change enables the proper collection of multiple primary key columns for composite keys. The method now appends each primary key column to the list rather than overwriting it with the last one found.

ibis-server/app/model/metadata/oracle.py (1)

91-91: Initialize primaryKey as empty list to support composite keys.

Changed from an empty string to an empty list, consistent with the type change in dto.py.

ibis-server/app/model/metadata/mssql.py (2)

84-84: Initialize primaryKey as empty list to support composite keys.

Changed from an empty string to an empty list, consistent with the type change in dto.py.


99-100: Append to primaryKey list instead of replacing value.

This change enables the proper collection of multiple primary key columns for composite keys. The method now appends each primary key column to the list rather than overwriting it with the last one found.

ibis-server/app/model/metadata/postgres.py (3)

29-29: Excellent enhancement to capture constraint information.

Adding the constraint_type column to the query results is essential for identifying primary key columns, which will enable proper handling of composite keys.


48-53: Well-structured JOIN operations to retrieve constraint metadata.

The LEFT JOINs to key_column_usage and table_constraints tables are properly implemented to retrieve constraint information while preserving all columns in the result set, which is exactly what we need for identifying primary key columns.


77-77: Correct initialization of primaryKey as an array.

Changed primaryKey initialization from a string to an empty array, which is necessary to support composite primary keys. This aligns with the PR objective to modify primaryKey from a single string to an array of strings.

Comment on lines 103 to 104

# TODO: manage primary key
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Implementation needed for primary key collection.

The TODO comment indicates that the implementation for collecting primary key information is still pending. Unlike the MySQL and MSSQL implementations, there's no code to populate the primaryKey list yet.

The Oracle metadata class needs to be updated to collect primary key information similar to the MySQL and MSSQL implementations. Without this, composite primary keys won't be properly represented for Oracle tables.

Check if there is a related SQL query that can be added to identify primary keys:


🏁 Script executed:

#!/bin/bash
# Search the codebase for examples of Oracle primary key queries
rg -A 5 -B 5 "PRIMARY KEY" --glob "*.py" | grep -i oracle

Length of output: 59


Update Oracle Metadata to Properly Handle Primary Keys [Critical]

The Oracle metadata implementation in ibis-server/app/model/metadata/oracle.py (lines 103–104) still contains a TODO for primary key handling. Unlike the MySQL and MSSQL implementations, this class does not currently populate the primaryKey list. This omission means that composite primary keys for Oracle tables may not be properly represented.

Suggestions:

  • Investigate Oracle system views (e.g., ALL_CONSTRAINTS and ALL_CONS_COLUMNS) to construct an SQL query that retrieves primary key columns.
  • Update the Oracle metadata extraction logic to populate the primaryKey list in a manner consistent with the other implementations.
  • Manually verify that the proposed SQL query and logic integration correctly capture both single and composite primary keys for Oracle tables.

@goldmedal
Copy link
Contributor

Thanks for contributing, @yalexbadan. I trigger the CI flow. I'll take a look at it after all workflows have passed.

@goldmedal
Copy link
Contributor

For the PR title, you need to follow the format of conventional commits.

@yalexbadan yalexbadan changed the title Fix proposal for issue 1105 fix(ibis): added primary key management on postgres and updated mssql and mysql to manage composite keys Mar 25, 2025
@goldmedal
Copy link
Contributor

@yalexbadan It seems to be a code style issue. You can install just, then run the following command in the ibis-server folder to fix it.

just fmt 

@goldmedal
Copy link
Contributor

It seems some CI is queued and will never be released 🤔 . I'll try to close it and reopen the PR.

@goldmedal goldmedal closed this Mar 28, 2025
@goldmedal goldmedal reopened this Mar 28, 2025
@goldmedal
Copy link
Contributor

goldmedal commented Mar 28, 2025

The testing for trino and clikchouse failed. I guess it's related to the change in dto.py. You can see the detail in the CI log.

tests/routers/v2/connector/test_clickhouse.py:514
tests/routers/v2/connector/test_trino.py:362

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
ibis-server/app/model/metadata/clickhouse.py (1)

38-68: 💡 Verification agent

🧩 Analysis chain

Consider verifying how primary keys are identified for ClickHouse.

While the initialization of primaryKey as an empty list is correct, I notice there's no code that populates this list with actual primary key columns. If ClickHouse supports primary keys, consider updating the code to identify and add them to the primaryKey list, similar to the changes made for other database connectors in this PR.


🏁 Script executed:

#!/bin/bash
# Check if there's any primary key identification in other connectors
echo "Checking how primary keys are handled in other connectors..."
rg -A 5 "primaryKey\." --type py ibis-server/app/model/metadata/

# Check if ClickHouse supports primary keys and how they might be identified
echo "Checking if ClickHouse has primary key information in system tables..."
rg -i "primary.?key" --type py ibis-server/app/model/metadata/clickhouse.py

Length of output: 2012


Action: Add or Document Primary Key Extraction for ClickHouse

The current implementation initializes primaryKey as an empty list while other connectors (Postgres, MySQL, MSSQL) actively populate it by appending primary key columns (e.g., using unique_tables[schema_table].primaryKey.append(row["column_name"])). This suggests that if ClickHouse supports primary keys, similar logic should be implemented here. If not, please add an inline comment or documentation clarifying why primary keys are not extracted for ClickHouse.

  • Option 1: If ClickHouse supports primary keys, update the code to identify and append the appropriate column names to the primaryKey list as done in the other connectors.
  • Option 2: If ClickHouse does not support primary keys, add a comment explaining the rationale for leaving primaryKey empty.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8076da8 and 7bcce3b.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/clickhouse.py (1 hunks)
  • ibis-server/app/model/metadata/trino.py (1 hunks)
🔇 Additional comments (1)
ibis-server/app/model/metadata/clickhouse.py (1)

55-55: LGTM: Updated primaryKey initialization to support composite keys.

The change from initializing primaryKey as an empty string to an empty list is consistent with the broader PR changes to support composite keys across different database connectors.

@@ -62,7 +62,7 @@ def get_table_list(self) -> list[Table]:
catalog=row["table_catalog"],
table=row["table_name"],
),
primaryKey="",
primaryKey=[],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Initialization changed but no logic to populate primary keys

The change from initializing primaryKey as a string to a list is consistent with the modifications in other database metadata classes to support composite keys. However, unlike the MSSQL, MySQL, and PostgreSQL implementations mentioned in the PR objectives, there's no logic in this file to populate the primaryKey list with actual primary key column names.

This change may be incomplete without corresponding logic to populate the primary key list. Please verify if Trino supports retrieving primary key information and, if so, update either the SQL query in get_table_list or implement the get_constraints method to populate the primaryKey list. This might be related to the test failures mentioned in the PR comments for test_trino.py.


🏁 Script executed:

#!/bin/bash
# Check how primary keys are handled in other metadata modules

echo "Looking at how other metadata modules handle primary keys..."
echo "==============================================="

# Look for primaryKey initialization and population in other metadata modules
echo "Checking primaryKey initialization in other metadata modules:"
rg -A 1 "primaryKey\s*=\s*\[\]" ibis-server/app/model/metadata/

echo "==============================================="

# Look for code that populates primary keys in other modules 
echo "Checking how primary keys are populated in other modules:"
rg -A 10 "\.primaryKey\.append" ibis-server/app/model/metadata/

echo "==============================================="

# Check the Table class definition in dto.py to verify the type change
echo "Examining Table class definition in dto.py:"
rg -A 20 "class Table" ibis-server/app/model/metadata/dto.py

echo "==============================================="

# Check the failing test to understand requirements
echo "Looking for trino test file mentioned in PR comments:"
fd "test_trino.py" ibis-server/ --exec cat {}

Length of output: 20170


Action Required: Address Missing Primary Key Population for TRINO

The TRINO metadata module now initializes primaryKey as an empty list—which aligns with the recent changes in other drivers—but, unlike the MSSQL, MySQL, and PostgreSQL implementations, there is no logic to populate this list with actual primary key column names. The search confirms that while other modules append to primaryKey (e.g., via their respective get_constraints methods), the TRINO module lacks any similar population strategy. This may be contributing to the test failures in test_trino.py.

  • Verify if Trino supports retrieving primary key details.
  • If supported, update the SQL query in get_table_list or implement a get_constraints method to append the primary key column data.
  • If not supported, please document this behavior to clarify the intentional absence of key population.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bcce3b and dbff7d8.

📒 Files selected for processing (18)
  • ibis-server/app/model/metadata/bigquery.py (1 hunks)
  • ibis-server/app/model/metadata/canner.py (1 hunks)
  • ibis-server/app/model/metadata/snowflake.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_canner.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_gcs_file.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_local_file.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_minio_file.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_mssql.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_mysql.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_oracle.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_s3_file.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_snowflake.py (1 hunks)
  • ibis-server/tests/routers/v2/test_analysis.py (2 hunks)
  • ibis-server/tests/routers/v2/test_relationship_valid.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/test_query.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/local_file/test_query.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (2 hunks)
🔇 Additional comments (26)
ibis-server/app/model/metadata/snowflake.py (1)

61-61: Updated primary key to support composite keys

The change updates the primaryKey parameter initialization from a string to a list, supporting the new composite key structure as described in the PR objectives.

ibis-server/app/model/metadata/canner.py (1)

169-169: Updated primary key to support composite keys

The change updates the primaryKey parameter initialization from a string to a list, supporting composite keys.

ibis-server/tests/routers/v2/connector/test_oracle.py (1)

74-74: Updated test manifest to reflect composite key support

The test manifest has been correctly updated to use the new array-based primary key format, ensuring test consistency with the implementation changes.

ibis-server/tests/routers/v2/connector/test_canner.py (1)

69-69: Updated test manifest to reflect composite key support

The test manifest has been correctly updated to use the new array-based primary key format, which aligns with the implementation changes for composite key support.

ibis-server/tests/routers/v2/connector/test_postgres.py (1)

69-69: Primary key representation updated to support composite keys

The primary key representation has been changed from a string to a list, which is consistent with the overall PR objective of supporting composite primary keys.

ibis-server/tests/routers/v2/test_relationship_valid.py (1)

20-20: Primary key format standardized to list representation

The primary key has been updated from a string value to a list format, aligning with the changes made across the codebase to support composite keys.

ibis-server/tests/routers/v2/connector/test_snowflake.py (1)

64-64: Primary key representation consistent with PR changes

The change from a string primary key to a list-based representation maintains consistency across all connector tests and supports the new composite key functionality.

ibis-server/app/model/metadata/bigquery.py (1)

79-79: Primary key initialization updated to use list type

The primary key field initialization has been updated from an empty string to an empty list, properly implementing the type change required for composite primary key support.

ibis-server/tests/routers/v3/connector/bigquery/test_query.py (1)

56-56: Primary key format updated correctly to support composite keys.

The change from a string ("o_orderkey") to an array format (["o_orderkey"]) aligns with the PR objective of supporting composite primary keys. This modification is consistent with similar changes across other connector tests.

ibis-server/tests/routers/v2/connector/test_mysql.py (2)

67-67: Primary key format updated correctly to support composite keys.

The change from a string to an array format for the Orders model's primary key is consistent with the PR's goal of supporting composite primary keys.


76-76: Primary key format updated correctly to support composite keys.

The change from a string to an array format for the Customer model's primary key is consistent with the PR's goal of supporting composite primary keys.

ibis-server/tests/routers/v2/connector/test_mssql.py (1)

65-65: Primary key format updated correctly to support composite keys.

The change from a string ("orderkey") to an array format (["orderkey"]) aligns with the PR objective of supporting composite primary keys in the MSSQL connector tests.

ibis-server/tests/routers/v2/connector/test_gcs_file.py (2)

44-44: Primary key format updated correctly to support composite keys.

The change from a string to an array format for the Orders model's primary key is part of the broader effort to support composite primary keys across all connectors.


69-69: Primary key format updated correctly to support composite keys.

The change from a string to an array format for the Customer model's primary key is part of the broader effort to support composite primary keys across all connectors.

ibis-server/tests/routers/v2/test_analysis.py (2)

23-23: Change from string to list representation for primary key is appropriate.

This change updates the primaryKey from a string value to a list of strings, aligning with the PR's goal of supporting composite primary keys.


47-47: Change from string to list representation for primary key is appropriate.

Similar to the customer model, this change updates primaryKey from a string value to a list of strings, maintaining consistency in the primary key representation across models.

ibis-server/tests/routers/v2/connector/test_local_file.py (2)

39-39: Consistent primary key representation as an array is appropriate.

The change from "primaryKey": "orderkey" to "primaryKey": ["orderkey"] aligns with the PR objective to support composite primary keys while maintaining backward compatibility for single-column keys.


64-64: Consistent primary key representation as an array is appropriate.

The change from "primaryKey": "custkey" to "primaryKey": ["custkey"] follows the same pattern and ensures consistent representation of primary keys throughout the codebase.

ibis-server/tests/routers/v2/connector/test_s3_file.py (2)

44-44: Appropriate change to list representation for primary key.

This change maintains consistency with the updated data model for primary keys, preparing the code for composite primary key support while preserving backward compatibility.


69-69: Appropriate change to list representation for primary key.

Same pattern applied consistently to the Customer model's primary key, ensuring uniformity across the codebase in how primary keys are represented.

ibis-server/tests/routers/v3/connector/local_file/test_query.py (2)

32-32: Primary key structure updated correctly to array format.

This change ensures that the test fixtures properly reflect the new data model for primary keys, which now supports composite keys through array representation.


57-57: Primary key structure updated correctly to array format.

The update to array format for the Customer model's primary key completes the consistent representation of primary keys throughout the test fixtures.

ibis-server/tests/routers/v3/connector/postgres/test_query.py (2)

62-62: Properly updated primary key representation for composite key support.

The change from string to array format for primary keys correctly implements the requirements for supporting composite primary keys while maintaining compatibility with single-column primary keys.


81-81: Correctly updated customer model's primary key to array format.

This change is consistent with the modification pattern throughout the PR, converting the primary key representation from a string to an array to support composite keys.

ibis-server/tests/routers/v2/connector/test_minio_file.py (2)

42-42: Properly updated Orders model primary key format.

The change from a string to an array format for the orderkey primary key aligns with the PR's goal of supporting composite primary keys across all connectors.


67-67: Correctly updated Customer model primary key format.

The primary key representation has been properly changed from a string to an array, maintaining consistency with the changes throughout the codebase.

@@ -41,7 +41,7 @@
"type": "varchar",
},
],
"primaryKey": "orderkey",
"primaryKey": ["orderkey"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify all test assertions are compatible with the new primary key format.

While the primary key format changes look good, it's important to ensure that any assertions or validations in the test suite that might reference these primary keys are also updated to expect lists instead of strings.

Also applies to: 69-69


🏁 Script executed:

#!/bin/bash
# Scan for assertions or validations that might expect primaryKey to be a string
rg -A 5 -B 5 "primaryKey" --glob "*.py" | grep -E "assert|test" | grep -v "\[\".*\"\]"

Length of output: 30078


Attention: Update Test Assertions for Primary Key Format Change

The primary key update in ibis-server/tests/routers/v2/connector/test_gcs_file.py (lines 44 and 69) now uses a list format ("primaryKey": ["orderkey"]). However, our verification indicates that several tests—such as in wren-core-py/tests/test_modeling_core.py—still reference the primary key as a string (e.g., "primaryKey": "c_custkey", "primaryKey": "o_orderkey", etc.).

  • ibis-server/tests/routers/v2/connector/test_gcs_file.py (lines 44 and 69): Confirm that any assertions or validations reflect the new list format.
  • Other test files (e.g., wren-core-py/tests/test_modeling_core.py): Review and update any test assertions or validations that still expect the primary key as a string to ensure consistency with the updated format.

Please update the tests accordingly to prevent potential mismatches and unexpected behavior during test execution.

@github-actions github-actions bot added core v1-engine-changed rust Pull requests that update Rust code labels Mar 31, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
wren-base/src/main/java/io/wren/base/dto/Model.java (1)

84-84: Consider adding validation for primary key list.

While the implementation looks good, consider adding validation to ensure the primary key list is not empty when required, similar to how other fields are validated.

- this.primaryKey = primaryKey;
+ this.primaryKey = primaryKey == null ? null : primaryKey.isEmpty() ? null : primaryKey;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbff7d8 and 592ed28.

📒 Files selected for processing (21)
  • wren-base/src/main/java/io/wren/base/dto/Model.java (4 hunks)
  • wren-base/src/test/java/io/wren/base/dto/TestMacro.java (3 hunks)
  • wren-base/src/test/java/io/wren/base/dto/TestManifestSerDe.java (2 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/AbstractTestModel.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestAllRulesRewrite.java (2 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestCumulativeMetric.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestEnumRewrite.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestMetric.java (4 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestModelRefSql.java (2 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestModelTableReference.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestView.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestWrenDataLineage.java (5 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/analyzer/TestDecisionPointAnalyzer.java (1 hunks)
  • wren-base/src/test/java/io/wren/base/sqlrewrite/analyzer/TestStatementAnalyzer.java (1 hunks)
  • wren-base/src/test/resources/tpch_mdl.json (6 hunks)
  • wren-core-base/tests/data/mdl.json (3 hunks)
  • wren-core-py/tests/test_modeling_core.py (3 hunks)
  • wren-core/core/tests/data/mdl.json (3 hunks)
  • wren-tests/src/test/java/io/wren/testing/TestAnalysisResource.java (1 hunks)
  • wren-tests/src/test/java/io/wren/testing/TestMDLResource.java (1 hunks)
  • wren-tests/src/test/java/io/wren/testing/TestMDLResourceV2.java (3 hunks)
✅ Files skipped from review due to trivial changes (2)
  • wren-tests/src/test/java/io/wren/testing/TestMDLResource.java
  • wren-base/src/test/java/io/wren/base/sqlrewrite/TestWrenDataLineage.java
🧰 Additional context used
🧬 Code Definitions (1)
wren-base/src/test/java/io/wren/base/sqlrewrite/TestModelTableReference.java (1)
wren-base/src/main/java/io/wren/base/dto/Model.java (1)
  • Model (29-191)
🔇 Additional comments (39)
wren-base/src/test/java/io/wren/base/sqlrewrite/TestEnumRewrite.java (1)

53-53: Good update to support composite primary keys.

The change from a single string to a list of strings for the primary key parameter correctly implements the needed update for composite primary key support.

wren-base/src/test/java/io/wren/base/sqlrewrite/TestView.java (1)

55-55: Good update to support composite primary keys.

The change from a single string to a list of strings for the primary key parameter correctly implements the needed update for composite primary key support.

wren-base/src/test/java/io/wren/base/sqlrewrite/analyzer/TestStatementAnalyzer.java (1)

253-253: Good update to support composite primary keys.

The change from a single string to a list of strings for the primary key parameter correctly implements the needed update for composite primary key support.

wren-tests/src/test/java/io/wren/testing/TestAnalysisResource.java (1)

114-115: Good update to support composite primary keys.

The changes from single strings to lists of strings for the primary key parameters correctly implement the needed update for composite primary key support. This modification ensures consistent handling of primary keys across the codebase and enables future support for tables with multiple-column primary keys.

wren-base/src/test/java/io/wren/base/sqlrewrite/TestCumulativeMetric.java (1)

118-118: Refactor approved: Changed refColumn parameter to support composite keys

This change modifies the refColumn parameter from a single string to a list, supporting the broader effort to enable composite key management. This is consistent with similar changes across the codebase.

wren-base/src/test/java/io/wren/base/dto/TestManifestSerDe.java (1)

62-62: Primary key representation updated to support composite keys

The primary key representation has been changed from a single string to a list of strings, which is consistent with the PR objective of supporting composite key management. This change prepares the test data model for scenarios where multiple fields might form a composite key.

Also applies to: 92-92

wren-core-py/tests/test_modeling_core.py (1)

28-28: Python test model updated to represent primary keys as lists

Modified primary key representation from single strings to lists across all models in the test manifest. This change:

  1. Ensures consistency between the Java and Python codebases
  2. Enables future support for composite keys
  3. Aligns with the PR objective of improving primary key management

Also applies to: 46-46, 59-59

wren-base/src/test/java/io/wren/base/sqlrewrite/TestModelRefSql.java (2)

17-18: Added import for List

Added the required import for java.util.List to support the List.of() method used for primary key definition.


27-29: Updated model initialization to use list-based primary keys

Changed the primary key representation in the model initialization from single strings to lists. This:

  1. Maintains consistency with other changes in the codebase
  2. Enables support for composite primary keys
  3. Follows the new approach for primary key management

Note that the orderkey_linenumber primary key name suggests this could actually be represented as a true composite key List.of("orderkey", "linenumber") in the future.

wren-tests/src/test/java/io/wren/testing/TestMDLResourceV2.java (3)

80-80: Primary key representation updated to support composite keys

The primary key representation has been changed from a single string to a list of strings using List.of("orderkey"). This change is consistent with the overall project update to support composite primary keys.


210-210: Primary key representation updated to support composite keys

The primary key is now properly represented as a list to support composite keys rather than a single string value.


265-265: Primary key representation updated to support composite keys

The primary key is now properly represented as a list to support composite keys rather than a single string value.

wren-base/src/test/java/io/wren/base/sqlrewrite/TestModelTableReference.java (2)

20-20: Added necessary List import for primary key conversion

The addition of the java.util.List import is required to support the conversion from single string primary keys to lists of strings.


28-30: Updated Model.onTableReference calls to use List for primary keys

All Model.onTableReference calls have been properly updated to use List.of() to wrap the primary key values, which aligns with the method signature change to support composite primary keys.

wren-core/core/tests/data/mdl.json (3)

37-37: Updated primaryKey representation to array format

The primaryKey for the customer model has been updated from a string to an array format to support composite primary keys.


73-73: Updated primaryKey representation to array format

The primaryKey for the profile model has been updated from a string to an array format to support composite primary keys.


119-119: Updated primaryKey representation to array format

The primaryKey for the orders model has been updated from a string to an array format to support composite primary keys.

wren-base/src/test/java/io/wren/base/sqlrewrite/analyzer/TestDecisionPointAnalyzer.java (1)

101-102: Updated onTableReference calls to use List for primary keys

The onTableReference calls for customer and orders models have been correctly updated to use List.of() to wrap primary key values, supporting the broader change to handle composite primary keys.

wren-base/src/test/java/io/wren/base/sqlrewrite/AbstractTestModel.java (1)

262-262: Primary key representation updated to support composite keys.

The change from a single string to a List of strings aligns with the broader codebase changes to support composite primary keys. This modification enables Model.onBaseObject to handle multiple columns as primary keys.

wren-base/src/test/java/io/wren/base/dto/TestMacro.java (3)

83-83: Updated primary key format to support composite keys.

Changed from a single string to a List of strings to align with the new composite primary key support.


118-118: Updated primary key format to support composite keys.

Changed from a single string to a List of strings to align with the new composite primary key support.


148-148: Updated primary key format to support composite keys.

Changed from a single string to a List of strings to align with the new composite primary key support.

wren-core-base/tests/data/mdl.json (3)

37-37: Updated primary key format in customer model.

Changed the representation of the primary key from a string to an array of strings, enhancing support for composite keys while maintaining backward compatibility.


73-73: Updated primary key format in profile model.

Changed the representation of the primary key from a string to an array of strings, enhancing support for composite keys while maintaining backward compatibility.


138-138: Updated primary key format in orders model.

Changed the representation of the primary key from a string to an array of strings, enhancing support for composite keys while maintaining backward compatibility.

wren-base/src/test/java/io/wren/base/sqlrewrite/TestAllRulesRewrite.java (3)

63-63: Updated primary key specification for Album model.

Changed from a single string to a List of strings to align with the new composite primary key support.


72-72: Updated primary key specification for Band model.

Changed from a single string to a List of strings to align with the new composite primary key support.


77-77: Updated primary key specification for Order model.

Changed from a single string to a List of strings to align with the new composite primary key support.

wren-base/src/test/java/io/wren/base/sqlrewrite/TestMetric.java (4)

67-67: Primary key format updated to support composite keys.

The change from a string literal to a List is consistent with the PR objective to add primary key management for composite keys.


80-80: Primary key format updated to support composite keys.

Good conversion from string primary key to a List format for the "Customer" model.


102-102: Primary key format updated to support composite keys.

Correctly updated the "Lineitem" model's primary key to use the List format.


282-282: Primary key format updated to support composite keys.

The test model's primary key format has been correctly updated to use a List.

wren-base/src/test/resources/tpch_mdl.json (1)

45-45: Primary key format updated to support composite keys.

All model primary keys in the JSON configuration have been correctly converted from string format to array format. This change enables support for composite keys while maintaining backward compatibility with existing single-column primary keys.

Also applies to: 77-77, 129-129, 146-146, 188-188, 215-215

wren-base/src/main/java/io/wren/base/dto/Model.java (6)

37-37: Updated primary key field type to support composite keys.

Changed from String to List<String> to enable support for composite primary keys.


51-54: Updated method signature to support composite keys.

The model method now correctly accepts a List<String> for the primary key parameter.


56-59: Updated method signature to support composite keys.

The onBaseObject method now correctly accepts a List<String> for the primary key parameter.


61-64: Updated method signature to support composite keys.

The onTableReference method now correctly accepts a List<String> for the primary key parameter.


73-73: Updated constructor parameter type to support composite keys.

Constructor parameter for primaryKey now accepts a List<String> instead of a String.


133-136: Updated getter return type to support composite keys.

The getPrimaryKey() method now returns a List<String> instead of a String.

@goldmedal
Copy link
Contributor

goldmedal commented Mar 31, 2025

@yalexbadan I think you shouldn't chagne the primary key in MDL to a list. The rewriter doesn't support a model with multiple primary key.

For the composite key case, the right way is creating a column to concat the key. Then, use the new column as the primiary key.
For example:

model: {
   name: "model_1",
   columns: [
      {
         "name": "col1",
        ...
      },
      {
           "name": "col2",
          ...
      },
      {
            "name": "col3",
             "exrpession": "concat(col1, col2)",
              ...
       }
   ],
    "primaryKey": "col3"
}

@yalexbadan
Copy link
Author

yalexbadan commented Mar 31, 2025

@yalexbadan I think you shouldn't chagne the primary key in MDL to a list. The rewriter doesn't support a model with multiple primary key.

For the composite key case, the right way is creating a column to concat the key. Then, use the new column as the primiary key. For example:

model: {
   name: "model_1",
   columns: [
      {
         "name": "col1",
        ...
      },
      {
           "name": "col2",
          ...
      },
      {
            "name": "col3",
             "exrpession": "concat(col1, col2)",
              ...
       }
   ],
    "primaryKey": "col3"
}

@goldmedal I'm trying to figure out from the code what I should change, but I'm not sure what you're referring to.
Are you referring to mdl.schema.json or model.metadata.*?

@goldmedal
Copy link
Contributor

The code of the java engine and rust engine shouldn't be changed:
wren-base, wren-core, wren-core-base, wren-core-py, wren-tests.

This change is only related to test_metadata_* in tests/routers/v2/connector

@yalexbadan
Copy link
Author

I re-read the code and re-analysed the solution I am trying to develop, but I am still not clear about your proposal.
Basically, it would be to remove what I have done completely, and simply create a "custom" field at metadata level, and then add that custom field in place of the primaryKey. It's not clear to me why intervene so massively on all components at this point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery core ibis python Pull requests that update Python code rust Pull requests that update Rust code v1-engine-changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants