Add groundedness pro eval #38063

MilesHolland · 2024-10-23T21:28:20Z

Add new service based groundedness evaluator, which uses the rai service to determine groundedness.

This has a few extra adaptations compared to a normal rai service evaluator, including:

A new column remapping function in the evaluate function to rename the evaluator's output label to a passing score when aggregated into a metric.
Some custom column renaming within the evaluator itself, since the desired output column prefixes (groundedness_pro) differs from the rai service name for this evaluation, while also needing some post processing to convert a numeric groundedness score into a true/false label (Note this will be further adapted once the binarization PR is merged)

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py

azure-sdk · 2024-10-23T21:55:01Z

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-ai-evaluation

…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]>

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/rai_service.py

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

...ation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_content_safety.py

...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_retrieval/_retrieval.py

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py

changliu2 · 2024-10-28T22:28:13Z

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py

+        result = await super()._do_eval(eval_input)
+        real_result = {}
+        real_result[self._output_prefix + "_label"] = (
+            result[EvaluationMetrics.GROUNDEDNESS + "_score"] >= self._passing_score


@MilesHolland, why do we not output the binary output as part of the AACS API? Is it because it is not part of the service call?

changliu2 · 2024-10-28T22:35:56Z

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py

+        azure_ai_project,
+        **kwargs,
+    ):
+        self._passing_score = 3  # TODO update once the binarization PR is merged


@posaninagendra, in order to reach parity with AACS groundedness, any ungrounded content detected will make its AACS binary output ungroundedDetected to be True.

isn't ungroundedDetected part of the service call output?

if not (meaning SDK only receives ungroundedPercentage as the output), then to match the logic for ungroudnedDetected, this self._passing_score should be 5, right?

For reference, here is the sample output on the AACS doc:

{ "ungroundedDetected": true, "ungroundedPercentage": 1, "ungroundedDetails": [ { "text": "12/hour.", "offset": { "utf8": 0, "utf16": 0, "codePoint": 0 }, "length": { "utf8": 8, "utf16": 8, "codePoint": 8 }, "reason": "None. The premise mentions a pay of \"10/hour\" but does not mention \"12/hour.\" It's neutral. " } ] }

* Adding service based groundedness * groundedness pro eval * remove groundedness and fix unit tests * run black * change evaluate label * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]> * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]> * comments and CL * re record tests * black and pylint * comments * nits * analysis * re cast * more mypy appeasement --------- Co-authored-by: Ankit Singhal <[email protected]> Co-authored-by: Neehar Duvvuri <[email protected]>

singankit and others added 7 commits October 18, 2024 13:22

Adding service based groundedness

5cd49cf

Merge branch 'main' into eval/feature/groundedness-pro

17d207a

groundedness pro eval

fe2626c

remove groundedness and fix unit tests

e2121f4

Merge branch 'main' into eval/feature/groundedness-pro

978b04e

run black

ffe0a1f

change evaluate label

d774a2e

MilesHolland requested a review from a team as a code owner October 23, 2024 21:28

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Oct 23, 2024

needuv reviewed Oct 23, 2024

View reviewed changes

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Outdated Show resolved Hide resolved

needuv reviewed Oct 23, 2024

View reviewed changes

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Outdated Show resolved Hide resolved

needuv reviewed Oct 23, 2024

View reviewed changes

...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Outdated Show resolved Hide resolved

MilesHolland and others added 6 commits October 24, 2024 10:30

Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evalua…

341cf85

…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]>

Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evalua…

f2d5923

…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]>

comments and CL

2349020

Merge branch 'main' into eval/feature/groundedness-pro

3de9850

re record tests

2ed76aa

black and pylint

67fa1cb

MilesHolland changed the title ~~Eval/feature/groundedness pro~~ Add groundedness pro eval Oct 24, 2024