-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add groundedness pro eval #38063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add groundedness pro eval #38063
Conversation
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
|
API change check APIView has identified API level changes in this PR and created following API reviews. |
…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]>
…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]>
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/rai_service.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Show resolved
Hide resolved
...ation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_content_safety.py
Show resolved
Hide resolved
...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_retrieval/_retrieval.py
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Show resolved
Hide resolved
| result = await super()._do_eval(eval_input) | ||
| real_result = {} | ||
| real_result[self._output_prefix + "_label"] = ( | ||
| result[EvaluationMetrics.GROUNDEDNESS + "_score"] >= self._passing_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MilesHolland, why do we not output the binary output as part of the AACS API? Is it because it is not part of the service call?
| azure_ai_project, | ||
| **kwargs, | ||
| ): | ||
| self._passing_score = 3 # TODO update once the binarization PR is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@posaninagendra, in order to reach parity with AACS groundedness, any ungrounded content detected will make its AACS binary output ungroundedDetected to be True.
- isn't
ungroundedDetectedpart of the service call output? - if not (meaning SDK only receives
ungroundedPercentageas the output), then to match the logic for ungroudnedDetected, this self._passing_score should be 5, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here is the sample output on the AACS doc:
{ "ungroundedDetected": true, "ungroundedPercentage": 1, "ungroundedDetails": [ { "text": "12/hour.", "offset": { "utf8": 0, "utf16": 0, "codePoint": 0 }, "length": { "utf8": 8, "utf16": 8, "codePoint": 8 }, "reason": "None. The premise mentions a pay of \"10/hour\" but does not mention \"12/hour.\" It's neutral. " } ] }
* Adding service based groundedness * groundedness pro eval * remove groundedness and fix unit tests * run black * change evaluate label * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]> * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <[email protected]> * comments and CL * re record tests * black and pylint * comments * nits * analysis * re cast * more mypy appeasement --------- Co-authored-by: Ankit Singhal <[email protected]> Co-authored-by: Neehar Duvvuri <[email protected]>
Add new service based groundedness evaluator, which uses the rai service to determine groundedness.
This has a few extra adaptations compared to a normal rai service evaluator, including:
evaluatefunction to rename the evaluator's output label to a passing score when aggregated into a metric.