Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration script for Azure Cosmos DB, old container to new container #2442

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pamelafox
Copy link
Collaborator

@pamelafox pamelafox commented Mar 21, 2025

Purpose

This adds migration script as requested by developer in the PR that introduced the new schema:
https://github.com/Azure-Samples/azure-search-openai-demo/pull/2312/files

Thanks @madebygps for pairing on this.

We structured it as a simple script, as opposed to an Azure Function, since developers should only need to run it once or perhaps twice. A developer can run it once while the old schema is running, switch over to the new schema, and then run it again. It should be idempotent.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[X] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox pamelafox marked this pull request as ready for review March 21, 2025 22:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a migration script to transform CosmosDB data from an old schema to a new one and includes thorough tests to verify the migration logic.

  • Introduces CosmosDBMigrator to connect to CosmosDB and migrate items by transforming each document to include both session and message_pair types.
  • Adds tests that simulate paginated query results and verify the batch upsert operations using mocks.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
scripts/cosmosdb_migration.py Implements migration logic, connecting to both old and new containers, and performing transformation and batching.
tests/test_cosmosdb_migration.py Provides tests with comprehensive mocks to verify migration functionality.
Comments suppressed due to low confidence (1)

scripts/cosmosdb_migration.py:129

  • [nitpick] Consider replacing the print statement with a logging solution (e.g., using the logging module) for better production practice and control over logging output.
print(f"Total items migrated: {item_migration_count}")

try:
await self.new_container.read()
except Exception:
raise ValueError(f"New container {self.new_container.id} does not exist")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does migration script need to create the container?

if not self.old_container or not self.new_container:
raise ValueError("Containers do not exist")

query_results = self.old_container.query_items(query="SELECT * FROM c")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We select * but we are only copying specific items. IMO either:

  1. Fetch all columns and copy all columns
  2. Only select the columns you want to copy

}
batch_operations.append(("upsert", (message_pair,)))

# Execute the batch using partition key [entra_oid, session_id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong? we are using [entra_oid, id]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants