Pass Standard Tests #35
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the necessary changes to make sure our integrations pass the standard tests offered in
langchain-tests
.Changes include:
Previously, inserting documents with duplicate IDs could raise a unique constraint error and fail the entire batch. We now use batcherrors=True (https://python-oracledb.readthedocs.io/en/latest/user_guide/batch_statement.html#handling-data-errors ) so per-row errors don’t invalidate other inserts. Only successfully inserted IDs are returned.
Optional upsert behavior: Standard tests expect rows with duplicate IDs to be updated rather than erroring. To preserve backward compatibility, we introduced a constructor parameter
mutate_on_duplicate
:False (default): preserve previous behavior (no updates on duplicate IDs).
True: update existing rows (texts, metadata, etc.) when duplicate IDs are provided.
New methods: Added
get_by_ids
andaget_by_ids
.ID handling and hashing
add_texts
, we generate them viauuid.uuid4()
and store a hashed version in a RAW column. Users need these generated ids to use indelete
orget_by_ids
. To enable thisadd_texts
is expected to return these generated ids.delete
orget_by_ids
as we hash them again to search in the documents:This behaviour is fixed to return the unhashed versions.
similarity_search
functions returnedDocuments
did not have theid
field as we did not have the original unhashed ids not saved to DB. To keep the table structure same for users with existing tables, these original ids are added to themetadata
with the key"__orcl_internal_doc_id"
, which is then used to returnDocuments
including theid
fields.