@@ -203,20 +203,21 @@ will be used. The following metrics are supported:
203203[[k-precision]]
204204===== Precision at K (P@k)
205205
206- This metric measures the number of relevant results in the top k search results.
207- It's a form of the well-known
208- https://en.wikipedia.org/wiki/Information_retrieval#Precision[Precision] metric
209- that only looks at the top k documents. It is the fraction of relevant documents
210- in those first k results. A precision at 10 (P@10) value of 0.6 then means six
211- out of the 10 top hits are relevant with respect to the user's information need.
212-
213- P@k works well as a simple evaluation metric that has the benefit of being easy
214- to understand and explain. Documents in the collection need to be rated as either
215- relevant or irrelevant with respect to the current query. P@k does not take
216- into account the position of the relevant documents within the top k results,
217- so a ranking of ten results that contains one relevant result in position 10 is
218- equally as good as a ranking of ten results that contains one relevant result
219- in position 1.
206+ This metric measures the proportion of relevant results in the top k search results.
207+ It's a form of the well-known
208+ https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision[Precision]
209+ metric that only looks at the top k documents. It is the fraction of relevant
210+ documents in those first k results. A precision at 10 (P@10) value of 0.6 then
211+ means 6 out of the 10 top hits are relevant with respect to the user's
212+ information need.
213+
214+ P@k works well as a simple evaluation metric that has the benefit of being easy
215+ to understand and explain. Documents in the collection need to be rated as either
216+ relevant or irrelevant with respect to the current query. P@k is a set-based
217+ metric and does not take into account the position of the relevant documents
218+ within the top k results, so a ranking of ten results that contains one
219+ relevant result in position 10 is equally as good as a ranking of ten results
220+ that contains one relevant result in position 1.
220221
221222[source,console]
222223--------------------------------
@@ -253,6 +254,58 @@ If set to 'true', unlabeled documents are ignored and neither count as relevant
253254|=======================================================================
254255
255256
257+ [float]
258+ [[k-recall]]
259+ ===== Recall at K (R@k)
260+
261+ This metric measures the total number of relevant results in the top k search
262+ results. It's a form of the well-known
263+ https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Recall[Recall]
264+ metric. It is the fraction of relevant documents in those first k results
265+ relative to all possible relevant results. A recall at 10 (R@10) value of 0.5 then
266+ means 4 out of 8 relevant documents, with respect to the user's information
267+ need, were retrieved in the 10 top hits.
268+
269+ R@k works well as a simple evaluation metric that has the benefit of being easy
270+ to understand and explain. Documents in the collection need to be rated as either
271+ relevant or irrelevant with respect to the current query. R@k is a set-based
272+ metric and does not take into account the position of the relevant documents
273+ within the top k results, so a ranking of ten results that contains one
274+ relevant result in position 10 is equally as good as a ranking of ten results
275+ that contains one relevant result in position 1.
276+
277+ [source,console]
278+ --------------------------------
279+ GET /twitter/_rank_eval
280+ {
281+ "requests": [
282+ {
283+ "id": "JFK query",
284+ "request": { "query": { "match_all": {}}},
285+ "ratings": []
286+ }],
287+ "metric": {
288+ "recall": {
289+ "k" : 20,
290+ "relevant_rating_threshold": 1
291+ }
292+ }
293+ }
294+ --------------------------------
295+ // TEST[setup:twitter]
296+
297+ The `recall` metric takes the following optional parameters
298+
299+ [cols="<,<",options="header",]
300+ |=======================================================================
301+ |Parameter |Description
302+ |`k` |sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
303+ in the query. Defaults to 10.
304+ |`relevant_rating_threshold` |sets the rating threshold above which documents are considered to be
305+ "relevant". Defaults to `1`.
306+ |=======================================================================
307+
308+
256309[float]
257310===== Mean reciprocal rank
258311
0 commit comments