You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[MLOB-4457] Update links/images for the new evaluations experience (#32741)
* update links
* update image
* draft
* applying edits
* restore old image and just make a new one
---------
Co-authored-by: cecilia saixue watt <[email protected]>
Copy file name to clipboardExpand all lines: content/en/llm_observability/evaluations/_index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ aliases:
8
8
9
9
## Overview
10
10
11
-
LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Settings > Evaluations**][8].
11
+
LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Evaluations**][8].
12
12
13
13
### Custom LLM-as-a-judge evaluations
14
14
@@ -47,4 +47,4 @@ In addition to evaluating the input and output of LLM requests, agents, workflow
Copy file name to clipboardExpand all lines: content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Learn more about the [compatibility requirements][6].
30
30
### Configure the prompt
31
31
32
32
1. In Datadog, navigate to the LLM Observability [Evaluations page][1]. Select **Create Evaluation**, then select **Create your own**.
33
-
{{< img src="llm_observability/evaluations/custom_llm_judge_1.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}}
33
+
{{< img src="llm_observability/evaluations/custom_llm_judge_1-2.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}}
34
34
35
35
2. Provide a clear, descriptive **evaluation name** (for example, `factuality-check` or `tone-eval`). You can use this name when querying evaluation results. The name must be unique within your application.
36
36
@@ -234,7 +234,9 @@ Refine your prompt and schema until outputs are consistent and interpretable.
234
234
235
235
## Viewing and using results
236
236
237
-
After you save your evaluation, Datadog automatically runs your evaluation on targeted spans. Results are available across LLM Observability in near-real-time. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations.
237
+
After you **Save and Publish** your evaluation, Datadog automatically runs your evaluation on targeted spans. Alternatively, you can **Save as Draft** and edit or enable your evaluation later.
238
+
239
+
Results are available across LLM Observability in near-real-time for published evaluations. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations.
238
240
239
241
{{< img src="llm_observability/evaluations/custom_llm_judge_3-2.png" alt="The Evaluations tab of a trace, displaying custom evaluation results alongside managed evaluations." style="width:100%;" >}}
Copy file name to clipboardExpand all lines: content/en/llm_observability/evaluations/managed_evaluations/_index.md
+6-8Lines changed: 6 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ aliases:
17
17
18
18
## Overview
19
19
20
-
Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By enabling them, you can assess the effectiveness of your application's responses, including detection of negative sentiment, topic relevancy, toxicity, failure to answer and hallucination.
20
+
Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By creating them, you can assess the effectiveness of your application's responses, including detection of sentiment, topic relevancy, toxicity, failure to answer, and hallucination.
21
21
22
22
LLM Observability associates evaluations with individual spans so you can view the inputs and outputs that led to a specific evaluation.
23
23
@@ -98,7 +98,7 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang
98
98
99
99
## Create new evaluations
100
100
101
-
1. Navigate to [**AI Observability > Settings > Evaluations**][2].
101
+
1. Navigate to [**AI Observability > Evaluations**][2].
102
102
1. Click on the **Create Evaluation** button on the top right corner.
103
103
1. Select a specific managed evaluation. This will open the evalution editor window.
104
104
1. Select the LLM application(s) you want to configure your evaluation for.
@@ -109,14 +109,12 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang
109
109
- (Optional) Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than `0` and less than or equal to `100` (sampling all spans).
110
110
1. (Optional) Configure evaluation options by selecting what subcategories should be flagged. Only available on some evaluations.
111
111
112
-
After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled.
112
+
After you click **Save and Publish**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. Alternatively, you can **Save as Draft** and edit or enable them later.
113
113
114
114
## Edit existing evaluations
115
115
116
-
1. Navigate to [**AI Observability > Settings > Evaluations**][2].
117
-
1. Find on the evaluation you want to edit and toggle the **Enabled Applications** button.
118
-
1. Select the edit icon to configure the evaluation for an individual LLM application or click on the application name.
119
-
1. Evaluations can be disabled by selecting the disable icon for an individual LLM application.
116
+
1. Navigate to [**AI Observability > Evaluations**][2].
117
+
1. Hover over the evaluation you want to edit and click the **Edit** button.
120
118
121
119
### Estimated token usage
122
120
@@ -335,7 +333,7 @@ This check ensures that sensitive information is handled appropriately and secur
0 commit comments