@@ -203,20 +203,21 @@ will be used. The following metrics are supported:
203
203
[[k-precision]]
204
204
===== Precision at K (P@k)
205
205
206
- This metric measures the number of relevant results in the top k search results.
207
- It's a form of the well-known
208
- https://en.wikipedia.org/wiki/Information_retrieval#Precision[Precision] metric
209
- that only looks at the top k documents. It is the fraction of relevant documents
210
- in those first k results. A precision at 10 (P@10) value of 0.6 then means six
211
- out of the 10 top hits are relevant with respect to the user's information need.
212
-
213
- P@k works well as a simple evaluation metric that has the benefit of being easy
214
- to understand and explain. Documents in the collection need to be rated as either
215
- relevant or irrelevant with respect to the current query. P@k does not take
216
- into account the position of the relevant documents within the top k results,
217
- so a ranking of ten results that contains one relevant result in position 10 is
218
- equally as good as a ranking of ten results that contains one relevant result
219
- in position 1.
206
+ This metric measures the proportion of relevant results in the top k search results.
207
+ It's a form of the well-known
208
+ https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision[Precision]
209
+ metric that only looks at the top k documents. It is the fraction of relevant
210
+ documents in those first k results. A precision at 10 (P@10) value of 0.6 then
211
+ means 6 out of the 10 top hits are relevant with respect to the user's
212
+ information need.
213
+
214
+ P@k works well as a simple evaluation metric that has the benefit of being easy
215
+ to understand and explain. Documents in the collection need to be rated as either
216
+ relevant or irrelevant with respect to the current query. P@k is a set-based
217
+ metric and does not take into account the position of the relevant documents
218
+ within the top k results, so a ranking of ten results that contains one
219
+ relevant result in position 10 is equally as good as a ranking of ten results
220
+ that contains one relevant result in position 1.
220
221
221
222
[source,console]
222
223
--------------------------------
@@ -253,6 +254,58 @@ If set to 'true', unlabeled documents are ignored and neither count as relevant
253
254
|=======================================================================
254
255
255
256
257
+ [float]
258
+ [[k-recall]]
259
+ ===== Recall at K (R@k)
260
+
261
+ This metric measures the total number of relevant results in the top k search
262
+ results. It's a form of the well-known
263
+ https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Recall[Recall]
264
+ metric. It is the fraction of relevant documents in those first k results
265
+ relative to all possible relevant results. A recall at 10 (R@10) value of 0.5 then
266
+ means 4 out of 8 relevant documents, with respect to the user's information
267
+ need, were retrieved in the 10 top hits.
268
+
269
+ R@k works well as a simple evaluation metric that has the benefit of being easy
270
+ to understand and explain. Documents in the collection need to be rated as either
271
+ relevant or irrelevant with respect to the current query. R@k is a set-based
272
+ metric and does not take into account the position of the relevant documents
273
+ within the top k results, so a ranking of ten results that contains one
274
+ relevant result in position 10 is equally as good as a ranking of ten results
275
+ that contains one relevant result in position 1.
276
+
277
+ [source,console]
278
+ --------------------------------
279
+ GET /twitter/_rank_eval
280
+ {
281
+ "requests": [
282
+ {
283
+ "id": "JFK query",
284
+ "request": { "query": { "match_all": {}}},
285
+ "ratings": []
286
+ }],
287
+ "metric": {
288
+ "recall": {
289
+ "k" : 20,
290
+ "relevant_rating_threshold": 1
291
+ }
292
+ }
293
+ }
294
+ --------------------------------
295
+ // TEST[setup:twitter]
296
+
297
+ The `recall` metric takes the following optional parameters
298
+
299
+ [cols="<,<",options="header",]
300
+ |=======================================================================
301
+ |Parameter |Description
302
+ |`k` |sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
303
+ in the query. Defaults to 10.
304
+ |`relevant_rating_threshold` |sets the rating threshold above which documents are considered to be
305
+ "relevant". Defaults to `1`.
306
+ |=======================================================================
307
+
308
+
256
309
[float]
257
310
===== Mean reciprocal rank
258
311
0 commit comments