Skip to content

Conversation

timfdev
Copy link
Contributor

@timfdev timfdev commented Aug 12, 2025

pydantic-ai and ag-ui-protocol

need pydantic >= 2.10 and >=2.11.2 respectively, this breaks some of the unit tests

…s allowed token count. Make conflicting libraries pydantic-ai and ag-ui optional; disabling agent route if not installed. Make search routes async and fix small bugs in query building.
Copy link

codspeed-hq bot commented Aug 16, 2025

CodSpeed Performance Report

Merging #1028 will not alter performance

Comparing llm-integration (ed8e3ea) with main (0d6b31c)

Summary

✅ 13 untouched

Copy link

codecov bot commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 49.34243% with 1040 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.26%. Comparing base (0d6b31c) to head (ed8e3ea).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
orchestrator/search/indexing/indexer.py 22.72% 136 Missing ⚠️
orchestrator/search/retrieval/retriever.py 33.11% 101 Missing ⚠️
orchestrator/api/api_v1/endpoints/search.py 32.03% 70 Missing ⚠️
orchestrator/cli/speedtest.py 26.74% 62 Missing and 1 partial ⚠️
orchestrator/search/filters/base.py 42.05% 62 Missing ⚠️
orchestrator/search/core/types.py 69.19% 56 Missing and 5 partials ⚠️
orchestrator/cli/resize_embedding.py 21.21% 51 Missing and 1 partial ⚠️
orchestrator/search/retrieval/utils.py 22.72% 51 Missing ⚠️
orchestrator/search/retrieval/validation.py 25.00% 45 Missing ⚠️
orchestrator/search/retrieval/engine.py 26.66% 44 Missing ⚠️
... and 22 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1028      +/-   ##
==========================================
- Coverage   85.14%   79.26%   -5.89%     
==========================================
  Files         217      254      +37     
  Lines       10496    12543    +2047     
  Branches     1004     1232     +228     
==========================================
+ Hits         8937     9942    +1005     
- Misses       1305     2330    +1025     
- Partials      254      271      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

except Exception as e:
logger.warning(f"Failed to load schema for prompt: {e}")
schema_info = " Schema temporarily unavailable"
logger.error(f"Generated schema for agent prompt:\n{schema_info}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is suppose to be an error log?

def _extract_matching_field_from_filters(filters: FilterTree) -> MatchingField | None:
"""Extract the first path filter to use as matching field for structured searches.
TODO: Should we allow a list of matched fields in the MatchingField model?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what to do with this? new issue?

@Mark90 Mark90 self-requested a review September 16, 2025 12:52
Copy link
Contributor

@Mark90 Mark90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's a lot of work 🔥

Overall structure of the code is good, that's why I was able to leave a lot of questions and small remarks. I mean this as a good thing :)



def build_agent_app() -> ASGIApp:
if not app_settings.AGENT_MODEL or not app_settings.OPENAI_API_KEY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These settings are strings that can't be None so by default it will be enabled. Since users need to configure the LLM setup, by default it should IMO be disabled with a bool variable like AGENT_ENABLED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of failures has this shown?

entity_scores.join(entity_highlights, entity_scores.c.entity_id == entity_highlights.c.entity_id)
)
).cte("ranked_results")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we split this function up in one for the DB interaction part which produces an output, and another function that performs the below computations based on the former's output? And preferably also some unittests for the latter

@@ -0,0 +1,447 @@
from abc import ABC, abstractmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split up into a package with a module for each retriever type, it's a lot of scrolling now :)


def _quantize_score_for_pagination(self, score_value: float) -> BindParameter[Decimal]:
"""Convert score value to properly quantized Decimal parameter for pagination."""
pas_dec = Decimal(str(score_value)).quantize(Decimal("0.000000000001"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this change along with the SCORE_PRECISION if that ever changes?

If so maybe do something like f'{1 / 10**precision:.{precision}f}


if not matches:
substring_pattern = re.escape(word)
matches = list(re.finditer(substring_pattern, text, re.IGNORECASE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a resulting text has both word and substring matches, wouldn't we want to highlight the substring matches as well?


class TypeDefinition(BaseModel):
operators: list[FilterOp]
valueSchema: dict[FilterOp, ValueSchema]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is camelCase needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharp one, no I think this is left over from something I tried with pydantic aliases to use camelCase in the response, but it was hard to keep that consistent for deep nested data.

timfdev and others added 8 commits September 18, 2025 08:26
#1069)

* Refactor traverse.py to use model based traversal with typing introspection. Included with full unittest coverage

* some fixes

* move type mapping to types file and fix linting errors.
* Make the LLM module more configurable and do not install all deps straight away

* Fix linting problems

* Agentic app

* Fixes

* Simplify start up

* lint issue

* Added some initial documentation for the LLM module

---------

Co-authored-by: Tim Frohlich <[email protected]>
@pboers1988 pboers1988 merged commit 89561a4 into main Sep 18, 2025
15 checks passed
@pboers1988 pboers1988 deleted the llm-integration branch September 18, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants