Fix metadata deserialization in async mode for PGVector #125

shamspias · 2024-10-08T16:15:11Z

Problem

When using asynchronous methods with PGVector (async_mode=True), the metadata field retrieved from the database may be of type Fragment (from asyncpg) or other non-dict types. This causes a ValidationError when the Document class expects metadata to be a dictionary.

Solution

This pull request modifies the _results_to_docs_and_scores method to ensure that metadata is correctly converted into a dictionary before creating Document instances. The method now handles different possible types of metadata and attempts to deserialize it into a dict.

Changes

Updated _results_to_docs_and_scores method in PGVector class to handle metadata deserialization for different types (e.g., dict, str, Fragment).

Testing

Tested with async_mode=True and confirmed that the metadata field is correctly deserialized and no longer causes validation errors.
Ensured that the change does not affect the behavior when async_mode=False.

Related Issues

Metadata field not properly deserialized when using async_mode=True with PGVector #124

jaimeescano · 2024-10-15T06:45:28Z

Facing this very same issue. Wondering when this commit could be merged.
Big credits to @shamspias for providing the fix.

Regards

langchain_postgres/vectorstores.py

shamspias · 2024-10-23T03:19:13Z

@eyurtsev, please let me know if there is anything else I need to do. If not, please merge it.

simadimonyan · 2024-11-11T18:25:25Z

Did you fix the issue? I have related: langchain-ai/langchain#28029

shamspias · 2024-11-18T12:48:50Z

Did you fix the issue? I have related: langchain-ai/langchain#28029

yes I did fixed the issue you can use the branch if you like to

samsiuatpurple · 2025-01-06T23:50:45Z

Good work! It has fixed an issue in my langgraph app. Can a maintainer merge this as it's a blocker for us. Thanks!

lucelsbyleverageAI · 2025-01-11T19:29:18Z

Hi @shamspias @samsiuatpurple - I just ran into the same issue and used your PR code. It prevented the error but now it's just returning empty metadata. Assume you aren't having the same?

shamspias · 2025-01-13T12:45:42Z

Hi @shamspias @samsiuatpurple - I just ran into the same issue and used your PR code. It prevented the error but now it's just returning empty metadata. Assume you aren't having the same?

It works fine, I run several project with this PR.

fritzebner · 2025-01-24T13:31:02Z

Can someone tell us when this change will be in a release? When is the next release for langchain-postgres?

shamspias · 2025-01-25T12:13:37Z

Can someone tell us when this change will be in a release? When is the next release for langchain-postgres?

same questions

ccurme

Hi @shamspias, are you interested in iterating on this? It still needs some work:

The code breaks out the gate (json is not imported)
Most importantly, there are no tests.

The most useful thing would be a unit test or reproducible example demonstrating how to generate non-dict metadata.

eyurtsev · 2025-02-07T17:29:27Z

langchain_postgres/vectorstores.py

+                elif isinstance(metadata, str):
+                    metadata = json.loads(metadata)
+                elif hasattr(metadata, 'buf'):
+                    # For Fragment types (e.g., from asyncpg)


only psycopg3 is supported

Hi @eyurtsev, thanks for the review! I understand that only psycopg3 is officially supported. However, I’ve received reports of issues in async mode that suggest some users might be encountering non‑dict metadata (perhaps inadvertently using asyncpg or similar drivers).

This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf attribute, or an object with a decode() method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.

I’ve also added unit tests to simulate these scenarios and ensure the conversion works as expected. Please let me know if you’d like any adjustments or if you think we should further restrict this behavior given our psycopg3-only support.

@shamspias feel free to @ me if I don't respond quickly enough.

Are you able to create a minimal reproduction against the actual vectorstore? If so, you can send me the code snippet and I'm happy to update the tests myself.

This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf attribute, or an object with a decode() method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.

Can you confirm that this is specifically from asyncpg where you're seeing the failures?

We definitely don't want to mock the results from asyncpg. If we want to support asynpcg, the way to do it is to run the full suite of tests with that driver.

Hi @eyurtsev, I’ve put together a minimal reproduction that runs against a real Postgres instance (with pgvector) using asyncpg. The test confirms that the defensive logic for non-dict metadata triggers correctly without mocking. Let me know if you’d like the code snippet or any adjustments!

eyurtsev · 2025-02-20T16:54:09Z

tests/unit_tests/test_pgvector_metadata.py

@@ -0,0 +1,99 @@
+import sqlalchemy


We'll want to update the unit tests to reproduce behavior against the actual driver / database rather than rely on mocks. The code in question is designed specifically to work with a database, so requires more systematic testing.

Hi @eyurtsev! I’ve replaced the mocked unit tests with an integration test that runs against a live Postgres instance (with pgvector). It ensures the metadata deserialization logic is tested with the actual driver and database, just as you requested. Let me know if there’s anything else you’d like changed!

@eyurtsev After testing with new version of psycopg3, I can confirm that the issue has been resolved. I no longer encounter any errors during metadata deserialization, and the problem is no longer reproducible. Thanks for your support! However, it might be beneficial to update the code to include an extra condition as a safeguard for the future, ensuring similar issues are avoided.

…g FakeEmbeddings and manual extension creation

…add asyncpg dependency

…zation

…rch methods with score and vector queries

…pg in connection strings

Fix metadata deserialization in async mode for PGVector

9cab442

eyurtsev self-assigned this Oct 15, 2024

eyurtsev reviewed Oct 15, 2024

View reviewed changes

langchain_postgres/vectorstores.py Show resolved Hide resolved

shamspias added 2 commits October 16, 2024 11:40

Update existing asynchronous tests to include metadata assertions

512728f

Add tests for metadata deserialization in async PGVector operations

015a086

shamspias requested a review from eyurtsev October 16, 2024 07:09

Merge branch 'main' into fix/metadata/deserialization

0413300

Merge branch 'main' into fix/metadata/deserialization

d8e11ef

This was referenced Nov 18, 2024

Document metadata returns Fragment object instead of Dict in _results_to_docs_and_scores langchain-ai/langchain#28029

Closed

Document metadata returns Fragment object instead of Dict simadimonyan/feedconveyor#1

Closed

shamspias mentioned this pull request Nov 28, 2024

Metadata field not properly deserialized when using async_mode=True with PGVector #124

Open

shamspias and others added 2 commits December 15, 2024 21:16

fix test conflict with langchain-ai#149

a56c921

Merge branch 'main' into fix/metadata/deserialization

33a18b9

ccurme reviewed Jan 27, 2025

View reviewed changes

eyurtsev reviewed Feb 7, 2025

View reviewed changes

shamspias and others added 4 commits February 8, 2025 12:49

Merge branch 'main' into fix/metadata/deserialization

75f254b

fix: add missing import for json in vectorstores.py

ad890bb

test: add unit test for metadata deserialization in PGVector

ecb7e8a

test: add unit tests for metadata deserialization in PGVector

9227acd

eyurtsev reviewed Feb 20, 2025

View reviewed changes

Merge branch 'main' into fix/metadata/deserialization

16dfbaf

shamspias added 8 commits March 11, 2025 15:18

test: move PGVector metadata test from unit to integration using asyncpg

f4cdf73

test: add integration test for PGVector metadata deserialization usin…

27fe274

…g FakeEmbeddings and manual extension creation

test: add integration test for PGVector metadata deserialization and …

4db7c59

…add asyncpg dependency

chore: move asyncpg dependency from main to test group

9038c3d

feat(tests): add temp PGVector test DB fixture for metadata deseriali…

5989544

…zation

test: add integration tests for PGVector retriever and similarity sea…

317c39a

…rch methods with score and vector queries

chore: remove asyncpg

be50ffa

refactor: update default_db_url and test_db_url from asyncpg to psyco…

7650cdc

…pg in connection strings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix metadata deserialization in async mode for PGVector #125

Fix metadata deserialization in async mode for PGVector #125

shamspias commented Oct 8, 2024

jaimeescano commented Oct 15, 2024

shamspias commented Oct 23, 2024

simadimonyan commented Nov 11, 2024

shamspias commented Nov 18, 2024

samsiuatpurple commented Jan 6, 2025

lucelsbyleverageAI commented Jan 11, 2025

shamspias commented Jan 13, 2025

fritzebner commented Jan 24, 2025

shamspias commented Jan 25, 2025

ccurme left a comment

eyurtsev Feb 7, 2025

shamspias Feb 8, 2025

eyurtsev Feb 20, 2025

shamspias Mar 11, 2025

eyurtsev Feb 20, 2025

shamspias Mar 11, 2025

shamspias Mar 12, 2025

Fix metadata deserialization in async mode for PGVector #125

Are you sure you want to change the base?

Fix metadata deserialization in async mode for PGVector #125

Conversation

shamspias commented Oct 8, 2024

jaimeescano commented Oct 15, 2024

shamspias commented Oct 23, 2024

simadimonyan commented Nov 11, 2024

shamspias commented Nov 18, 2024

samsiuatpurple commented Jan 6, 2025

lucelsbyleverageAI commented Jan 11, 2025

shamspias commented Jan 13, 2025

fritzebner commented Jan 24, 2025

shamspias commented Jan 25, 2025

ccurme left a comment

Choose a reason for hiding this comment

eyurtsev Feb 7, 2025

Choose a reason for hiding this comment

shamspias Feb 8, 2025

Choose a reason for hiding this comment

eyurtsev Feb 20, 2025

Choose a reason for hiding this comment

shamspias Mar 11, 2025

Choose a reason for hiding this comment

eyurtsev Feb 20, 2025

Choose a reason for hiding this comment

shamspias Mar 11, 2025

Choose a reason for hiding this comment

shamspias Mar 12, 2025

Choose a reason for hiding this comment