Skip to content

Commit 4bec3ad

Browse files
author
Christoph Büscher
committed
Add ERR to ranking evaluation documentation (#32314)
This change adds a section about the Expected Reciprocal Rank metric (ERR) to the Ranking Evaluation documentation.
1 parent f1d1ff2 commit 4bec3ad

File tree

1 file changed

+50
-0
lines changed

1 file changed

+50
-0
lines changed

docs/reference/search/rank-eval.asciidoc

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,56 @@ in the query. Defaults to 10.
263263
|`normalize` | If set to `true`, this metric will calculate the https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG[Normalized DCG].
264264
|=======================================================================
265265

266+
[float]
267+
==== Expected Reciprocal Rank (ERR)
268+
269+
Expected Reciprocal Rank (ERR) is an extension of the classical reciprocal rank for the graded relevance case
270+
(Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. http://olivier.chapelle.cc/pub/err.pdf[Expected reciprocal rank for graded relevance].)
271+
272+
It is based on the assumption of a cascade model of search, in which a user scans through ranked search
273+
results in order and stops at the first document that satisfies the information need. For this reason, it
274+
is a good metric for question answering and navigation queries, but less so for survey oriented information
275+
needs where the user is interested in finding many relevant documents in the top k results.
276+
277+
The metric models the expectation of the reciprocal of the position at which a user stops reading through
278+
the result list. This means that relevant document in top ranking positions will contribute much to the
279+
overall score. However, the same document will contribute much less to the score if it appears in a lower rank,
280+
even more so if there are some relevant (but maybe less relevant) documents preceding it.
281+
In this way, the ERR metric discounts documents which are shown after very relevant documents. This introduces
282+
a notion of dependency in the ordering of relevant documents that e.g. Precision or DCG don't account for.
283+
284+
[source,js]
285+
--------------------------------
286+
GET /twitter/_rank_eval
287+
{
288+
"requests": [
289+
{
290+
"id": "JFK query",
291+
"request": { "query": { "match_all": {}}},
292+
"ratings": []
293+
}],
294+
"metric": {
295+
"expected_reciprocal_rank": {
296+
"maximum_relevance" : 3,
297+
"k" : 20
298+
}
299+
}
300+
}
301+
--------------------------------
302+
// CONSOLE
303+
// TEST[setup:twitter]
304+
305+
The `expected_reciprocal_rank` metric takes the following parameters:
306+
307+
[cols="<,<",options="header",]
308+
|=======================================================================
309+
|Parameter |Description
310+
| `maximum_relevance` | Mandatory parameter. The highest relevance grade used in the user supplied
311+
relevance judgments.
312+
|`k` | sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
313+
in the query. Defaults to 10.
314+
|=======================================================================
315+
266316
[float]
267317
=== Response format
268318

0 commit comments

Comments
 (0)