- 
                Notifications
    You must be signed in to change notification settings 
- Fork 102
Fix metadata deserialization in async mode for PGVector #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9cab442
              512728f
              015a086
              0413300
              d8e11ef
              a56c921
              33a18b9
              75f254b
              ad890bb
              ecb7e8a
              9227acd
              16dfbaf
              f4cdf73
              27fe274
              4db7c59
              9038c3d
              5989544
              317c39a
              be50ffa
              7650cdc
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| # pylint: disable=too-many-lines | ||
| from __future__ import annotations | ||
|  | ||
| import json | ||
| import contextlib | ||
| import enum | ||
| import logging | ||
|  | @@ -1057,17 +1058,38 @@ async def asimilarity_search_with_score_by_vector( | |
|  | ||
| def _results_to_docs_and_scores(self, results: Any) -> List[Tuple[Document, float]]: | ||
| """Return docs and scores from results.""" | ||
| docs = [ | ||
| ( | ||
| Document( | ||
| id=str(result.EmbeddingStore.id), | ||
| page_content=result.EmbeddingStore.document, | ||
| metadata=result.EmbeddingStore.cmetadata, | ||
| ), | ||
| result.distance if self.embeddings is not None else None, | ||
| docs = [] | ||
| for result in results: | ||
| metadata = result.EmbeddingStore.cmetadata | ||
|  | ||
| # Attempt to convert metadata to a dict | ||
| try: | ||
| if isinstance(metadata, dict): | ||
| pass # Already a dict | ||
| elif isinstance(metadata, str): | ||
| metadata = json.loads(metadata) | ||
| elif hasattr(metadata, 'buf'): | ||
| # For Fragment types (e.g., from asyncpg) | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. only psycopg3 is supported There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @eyurtsev, thanks for the review! I understand that only psycopg3 is officially supported. However, I’ve received reports of issues in async mode that suggest some users might be encountering non‑dict metadata (perhaps inadvertently using asyncpg or similar drivers). This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a  I’ve also added unit tests to simulate these scenarios and ensure the conversion works as expected. Please let me know if you’d like any adjustments or if you think we should further restrict this behavior given our psycopg3-only support. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shamspias feel free to @ me if I don't respond quickly enough. Are you able to create a minimal reproduction against the actual vectorstore? If so, you can send me the code snippet and I'm happy to update the tests myself. 
 Can you confirm that this is specifically from asyncpg where you're seeing the failures? We definitely don't want to mock the results from asyncpg. If we want to support asynpcg, the way to do it is to run the full suite of tests with that driver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @eyurtsev, I’ve put together a minimal reproduction that runs against a real Postgres instance (with pgvector) using asyncpg. The test confirms that the defensive logic for non-dict metadata triggers correctly without mocking. Let me know if you’d like the code snippet or any adjustments! | ||
| metadata_bytes = metadata.buf | ||
| metadata_str = metadata_bytes.decode('utf-8') | ||
| metadata = json.loads(metadata_str) | ||
| elif hasattr(metadata, 'decode'): | ||
| # For other byte-like types | ||
| metadata_str = metadata.decode('utf-8') | ||
| metadata = json.loads(metadata_str) | ||
| else: | ||
| metadata = {} # Default to empty dict if unknown type | ||
| except Exception as e: | ||
| self.logger.warning(f"Failed to deserialize metadata: {e}") | ||
| metadata = {} | ||
|  | ||
| doc = Document( | ||
| id=str(result.EmbeddingStore.id), | ||
| page_content=result.EmbeddingStore.document, | ||
| metadata=metadata, | ||
| ) | ||
| for result in results | ||
| ] | ||
| score = result.distance if self.embeddings is not None else None | ||
| docs.append((doc, score)) | ||
| return docs | ||
|  | ||
| def _handle_field_filter( | ||
|  | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.