Skip to content

Fix metadata deserialization in async mode for PGVector #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9cab442
Fix metadata deserialization in async mode for PGVector
shamspias Oct 8, 2024
512728f
Update existing asynchronous tests to include metadata assertions
shamspias Oct 16, 2024
015a086
Add tests for metadata deserialization in async PGVector operations
shamspias Oct 16, 2024
0413300
Merge branch 'main' into fix/metadata/deserialization
shamspias Nov 1, 2024
d8e11ef
Merge branch 'main' into fix/metadata/deserialization
shamspias Nov 18, 2024
a56c921
fix test conflict with #149
shamspias Dec 15, 2024
33a18b9
Merge branch 'main' into fix/metadata/deserialization
shamspias Dec 15, 2024
75f254b
Merge branch 'main' into fix/metadata/deserialization
shamspias Feb 8, 2025
ad890bb
fix: add missing import for json in vectorstores.py
shamspias Feb 8, 2025
ecb7e8a
test: add unit test for metadata deserialization in PGVector
shamspias Feb 8, 2025
9227acd
test: add unit tests for metadata deserialization in PGVector
shamspias Feb 8, 2025
16dfbaf
Merge branch 'main' into fix/metadata/deserialization
shamspias Mar 11, 2025
f4cdf73
test: move PGVector metadata test from unit to integration using asyncpg
shamspias Mar 11, 2025
27fe274
test: add integration test for PGVector metadata deserialization usin…
shamspias Mar 11, 2025
4db7c59
test: add integration test for PGVector metadata deserialization and …
shamspias Mar 11, 2025
9038c3d
chore: move asyncpg dependency from main to test group
shamspias Mar 11, 2025
5989544
feat(tests): add temp PGVector test DB fixture for metadata deseriali…
shamspias Mar 11, 2025
317c39a
test: add integration tests for PGVector retriever and similarity sea…
shamspias Mar 12, 2025
be50ffa
chore: remove asyncpg
shamspias Mar 12, 2025
7650cdc
refactor: update default_db_url and test_db_url from asyncpg to psyco…
shamspias Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 20 additions & 107 deletions examples/vectorstore.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,10 @@
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install --quiet -U langchain_cohere"
]
],
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -65,7 +65,6 @@
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_cohere import CohereEmbeddings\n",
"from langchain_postgres import PGVector\n",
Expand All @@ -83,7 +82,8 @@
" connection=connection,\n",
" use_jsonb=True,\n",
")"
]
],
"outputs": []
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -124,7 +124,6 @@
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = [\n",
" Document(page_content='there are cats in the pond', metadata={\"id\": 1, \"location\": \"pond\", \"topic\": \"animals\"}),\n",
Expand All @@ -138,7 +137,8 @@
" Document(page_content='the library hosts a weekly story time for kids', metadata={\"id\": 9, \"location\": \"library\", \"topic\": \"reading\"}),\n",
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={\"id\": 10, \"location\": \"community center\", \"topic\": \"classes\"})\n",
"]\n"
]
],
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -147,21 +147,10 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])"
]
],
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -170,30 +159,10 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),\n",
" Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),\n",
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
" Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
" Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),\n",
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),\n",
" Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'}),\n",
" Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),\n",
" Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'})]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.similarity_search('kitty', k=10)"
]
],
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -210,7 +179,6 @@
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = [\n",
" Document(page_content='there are cats in the pond', metadata={\"id\": 1, \"location\": \"pond\", \"topic\": \"animals\"}),\n",
Expand All @@ -224,7 +192,8 @@
" Document(page_content='the library hosts a weekly story time for kids', metadata={\"id\": 9, \"location\": \"library\", \"topic\": \"reading\"}),\n",
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={\"id\": 10, \"location\": \"community center\", \"topic\": \"classes\"})\n",
"]\n"
]
],
"outputs": []
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -260,26 +229,12 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),\n",
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
" Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'})]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.similarity_search('kitty', k=10, filter={\n",
" 'id': {'$in': [1, 5, 2, 9]}\n",
"})"
]
],
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -296,25 +251,13 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
" Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.similarity_search('ducks', k=10, filter={\n",
" 'id': {'$in': [1, 5, 2, 9]},\n",
" 'location': {'$in': [\"pond\", \"market\"]}\n",
"})"
]
],
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -323,19 +266,6 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
" Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.similarity_search('ducks', k=10, filter={\n",
" '$and': [\n",
Expand All @@ -344,7 +274,8 @@
" ]\n",
"}\n",
")"
]
],
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -353,30 +284,12 @@
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),\n",
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
" Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),\n",
" Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),\n",
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),\n",
" Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'}),\n",
" Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vectorstore.similarity_search('bird', k=10, filter={\n",
" 'location': { \"$ne\": 'pond'}\n",
"})"
]
],
"outputs": []
}
],
"metadata": {
Expand Down
41 changes: 31 additions & 10 deletions langchain_postgres/vectorstores.py
Original file line number Diff line number Diff line change
Expand Up @@ -1058,17 +1058,38 @@ async def asimilarity_search_with_score_by_vector(

def _results_to_docs_and_scores(self, results: Any) -> List[Tuple[Document, float]]:
"""Return docs and scores from results."""
docs = [
(
Document(
id=str(result.EmbeddingStore.id),
page_content=result.EmbeddingStore.document,
metadata=result.EmbeddingStore.cmetadata,
),
result.distance if self.embeddings is not None else None,
docs = []
for result in results:
metadata = result.EmbeddingStore.cmetadata

# Attempt to convert metadata to a dict
try:
if isinstance(metadata, dict):
pass # Already a dict
elif isinstance(metadata, str):
metadata = json.loads(metadata)
elif hasattr(metadata, 'buf'):
# For Fragment types (e.g., from asyncpg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only psycopg3 is supported

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eyurtsev, thanks for the review! I understand that only psycopg3 is officially supported. However, I’ve received reports of issues in async mode that suggest some users might be encountering non‑dict metadata (perhaps inadvertently using asyncpg or similar drivers).

This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf attribute, or an object with a decode() method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.

I’ve also added unit tests to simulate these scenarios and ensure the conversion works as expected. Please let me know if you’d like any adjustments or if you think we should further restrict this behavior given our psycopg3-only support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamspias feel free to @ me if I don't respond quickly enough.

Are you able to create a minimal reproduction against the actual vectorstore? If so, you can send me the code snippet and I'm happy to update the tests myself.

This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf attribute, or an object with a decode() method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.

Can you confirm that this is specifically from asyncpg where you're seeing the failures?

We definitely don't want to mock the results from asyncpg. If we want to support asynpcg, the way to do it is to run the full suite of tests with that driver.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eyurtsev, I’ve put together a minimal reproduction that runs against a real Postgres instance (with pgvector) using asyncpg. The test confirms that the defensive logic for non-dict metadata triggers correctly without mocking. Let me know if you’d like the code snippet or any adjustments!

metadata_bytes = metadata.buf
metadata_str = metadata_bytes.decode('utf-8')
metadata = json.loads(metadata_str)
elif hasattr(metadata, 'decode'):
# For other byte-like types
metadata_str = metadata.decode('utf-8')
metadata = json.loads(metadata_str)
else:
metadata = {} # Default to empty dict if unknown type
except Exception as e:
self.logger.warning(f"Failed to deserialize metadata: {e}")
metadata = {}

doc = Document(
id=str(result.EmbeddingStore.id),
page_content=result.EmbeddingStore.document,
metadata=metadata,
)
for result in results
]
score = result.distance if self.embeddings is not None else None
docs.append((doc, score))
return docs

def _handle_field_filter(
Expand Down
Loading