-
Notifications
You must be signed in to change notification settings - Fork 72
Fix metadata deserialization in async mode for PGVector #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix metadata deserialization in async mode for PGVector #125
Conversation
Facing this very same issue. Wondering when this commit could be merged. Regards |
@eyurtsev, please let me know if there is anything else I need to do. If not, please merge it. |
Did you fix the issue? I have related: langchain-ai/langchain#28029 |
yes I did fixed the issue you can use the branch if you like to |
Good work! It has fixed an issue in my langgraph app. Can a maintainer merge this as it's a blocker for us. Thanks! |
Hi @shamspias @samsiuatpurple - I just ran into the same issue and used your PR code. It prevented the error but now it's just returning empty metadata. Assume you aren't having the same? |
It works fine, I run several project with this PR. |
Can someone tell us when this change will be in a release? When is the next release for langchain-postgres? |
same questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shamspias, are you interested in iterating on this? It still needs some work:
- The code breaks out the gate (
json
is not imported) - Most importantly, there are no tests.
The most useful thing would be a unit test or reproducible example demonstrating how to generate non-dict metadata.
elif isinstance(metadata, str): | ||
metadata = json.loads(metadata) | ||
elif hasattr(metadata, 'buf'): | ||
# For Fragment types (e.g., from asyncpg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only psycopg3 is supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eyurtsev, thanks for the review! I understand that only psycopg3 is officially supported. However, I’ve received reports of issues in async mode that suggest some users might be encountering non‑dict metadata (perhaps inadvertently using asyncpg or similar drivers).
This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf
attribute, or an object with a decode()
method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.
I’ve also added unit tests to simulate these scenarios and ensure the conversion works as expected. Please let me know if you’d like any adjustments or if you think we should further restrict this behavior given our psycopg3-only support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamspias feel free to @ me if I don't respond quickly enough.
Are you able to create a minimal reproduction against the actual vectorstore? If so, you can send me the code snippet and I'm happy to update the tests myself.
This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a buf attribute, or an object with a decode() method) into a proper dict. This conversion will only trigger in cases where the metadata isn’t already a dict—so for psycopg3 users nothing changes.
Can you confirm that this is specifically from asyncpg where you're seeing the failures?
We definitely don't want to mock the results from asyncpg. If we want to support asynpcg, the way to do it is to run the full suite of tests with that driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eyurtsev, I’ve put together a minimal reproduction that runs against a real Postgres instance (with pgvector) using asyncpg. The test confirms that the defensive logic for non-dict metadata triggers correctly without mocking. Let me know if you’d like the code snippet or any adjustments!
@@ -0,0 +1,99 @@ | |||
import sqlalchemy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to update the unit tests to reproduce behavior against the actual driver / database rather than rely on mocks. The code in question is designed specifically to work with a database, so requires more systematic testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eyurtsev! I’ve replaced the mocked unit tests with an integration test that runs against a live Postgres instance (with pgvector). It ensures the metadata deserialization logic is tested with the actual driver and database, just as you requested. Let me know if there’s anything else you’d like changed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyurtsev After testing with new version of psycopg3, I can confirm that the issue has been resolved. I no longer encounter any errors during metadata deserialization, and the problem is no longer reproducible. Thanks for your support! However, it might be beneficial to update the code to include an extra condition as a safeguard for the future, ensuring similar issues are avoided.
…g FakeEmbeddings and manual extension creation
…add asyncpg dependency
…rch methods with score and vector queries
…pg in connection strings
Problem
When using asynchronous methods with
PGVector
(async_mode=True
), themetadata
field retrieved from the database may be of typeFragment
(fromasyncpg
) or other non-dict types. This causes aValidationError
when theDocument
class expectsmetadata
to be a dictionary.Solution
This pull request modifies the
_results_to_docs_and_scores
method to ensure thatmetadata
is correctly converted into a dictionary before creatingDocument
instances. The method now handles different possible types ofmetadata
and attempts to deserialize it into a dict.Changes
_results_to_docs_and_scores
method inPGVector
class to handlemetadata
deserialization for different types (e.g.,dict
,str
,Fragment
).Testing
async_mode=True
and confirmed that themetadata
field is correctly deserialized and no longer causes validation errors.async_mode=False
.Related Issues