5
5
6
6
experimental[]
7
7
8
- These functions are used for
9
- for <<dense-vector,`dense_vector`>> and
10
- <<sparse-vector,`sparse_vector`>> fields.
11
-
12
8
NOTE: During vector functions' calculation, all matched documents are
13
- linearly scanned. Thus, expect the query time grow linearly
9
+ linearly scanned. Thus, expect the query time grow linearly
14
10
with the number of matched documents. For this reason, we recommend
15
11
to limit the number of matched documents with a `query` parameter.
16
12
17
- Let's create an index with the following mapping and index a couple
13
+ ====== `dense_vector` functions
14
+
15
+ Let's create an index with a `dense_vector` mapping and index a couple
18
16
of documents into it.
19
17
20
18
[source,console]
@@ -27,9 +25,6 @@ PUT my_index
27
25
"type": "dense_vector",
28
26
"dims": 3
29
27
},
30
- "my_sparse_vector" : {
31
- "type" : "sparse_vector"
32
- },
33
28
"status" : {
34
29
"type" : "keyword"
35
30
}
@@ -40,21 +35,21 @@ PUT my_index
40
35
PUT my_index/_doc/1
41
36
{
42
37
"my_dense_vector": [0.5, 10, 6],
43
- "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
44
38
"status" : "published"
45
39
}
46
40
47
41
PUT my_index/_doc/2
48
42
{
49
43
"my_dense_vector": [-0.5, 10, 10],
50
- "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
51
44
"status" : "published"
52
45
}
53
46
47
+ POST my_index/_refresh
48
+
54
49
--------------------------------------------------
55
50
// TESTSETUP
56
51
57
- For dense_vector fields, `cosineSimilarity` calculates the measure of
52
+ The `cosineSimilarity` function calculates the measure of
58
53
cosine similarity between a given query vector and document vectors.
59
54
60
55
[source,console]
@@ -90,8 +85,8 @@ GET my_index/_search
90
85
NOTE: If a document's dense vector field has a number of dimensions
91
86
different from the query's vector, an error will be thrown.
92
87
93
- Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
94
- between a given query vector and document vectors.
88
+ The `dotProduct` function calculates the measure of
89
+ dot product between a given query vector and document vectors.
95
90
96
91
[source,console]
97
92
--------------------------------------------------
@@ -109,18 +104,24 @@ GET my_index/_search
109
104
}
110
105
},
111
106
"script": {
112
- "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
107
+ "source": """
108
+ double value = dotProduct(params.query_vector, doc['my_dense_vector']);
109
+ return sigmoid(1, Math.E, -value); <1>
110
+ """,
113
111
"params": {
114
- "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
112
+ "query_vector": [4, 3.4, -0.2]
115
113
}
116
114
}
117
115
}
118
116
}
119
117
}
120
118
--------------------------------------------------
121
119
122
- For dense_vector fields, `dotProduct` calculates the measure of
123
- dot product between a given query vector and document vectors.
120
+ <1> Using the standard sigmoid function prevents scores from being negative.
121
+
122
+ The `l1norm` function calculates L^1^ distance
123
+ (Manhattan distance) between a given query vector and
124
+ document vectors.
124
125
125
126
[source,console]
126
127
--------------------------------------------------
@@ -138,23 +139,28 @@ GET my_index/_search
138
139
}
139
140
},
140
141
"script": {
141
- "source": """
142
- double value = dotProduct(params.query_vector, doc['my_dense_vector']);
143
- return sigmoid(1, Math.E, -value); <1>
144
- """,
142
+ "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
145
143
"params": {
146
- "query_vector ": [4, 3.4, -0.2]
144
+ "queryVector ": [4, 3.4, -0.2]
147
145
}
148
146
}
149
147
}
150
148
}
151
149
}
152
150
--------------------------------------------------
153
151
154
- <1> Using the standard sigmoid function prevents scores from being negative.
152
+ <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
153
+ `l2norm` shown below represent distances or differences. This means, that
154
+ the more similar the vectors are, the lower the scores will be that are
155
+ produced by the `l1norm` and `l2norm` functions.
156
+ Thus, as we need more similar vectors to score higher,
157
+ we reversed the output from `l1norm` and `l2norm`. Also, to avoid
158
+ division by 0 when a document vector matches the query exactly,
159
+ we added `1` in the denominator.
155
160
156
- Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
157
- between a given query vector and document vectors.
161
+ The `l2norm` function calculates L^2^ distance
162
+ (Euclidean distance) between a given query vector and
163
+ document vectors.
158
164
159
165
[source,console]
160
166
--------------------------------------------------
@@ -172,26 +178,77 @@ GET my_index/_search
172
178
}
173
179
},
174
180
"script": {
175
- "source": """
176
- double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
177
- return sigmoid(1, Math.E, -value);
178
- """,
179
- "params": {
180
- "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
181
+ "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
182
+ "params": {
183
+ "queryVector": [4, 3.4, -0.2]
181
184
}
182
185
}
183
186
}
184
187
}
185
188
}
186
189
--------------------------------------------------
187
190
188
- For dense_vector fields, `l1norm` calculates L^1^ distance
189
- (Manhattan distance) between a given query vector and
190
- document vectors.
191
+ NOTE: If a document doesn't have a value for a vector field on which
192
+ a vector function is executed, an error will be thrown.
193
+
194
+ You can check if a document has a value for the field `my_vector` by
195
+ `doc['my_vector'].size() == 0`. Your overall script can look like this:
196
+
197
+ [source,js]
198
+ --------------------------------------------------
199
+ "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
200
+ --------------------------------------------------
201
+ // NOTCONSOLE
202
+
203
+ ====== `sparse_vector` functions
204
+
205
+ deprecated[7.6, The `sparse_vector` type is deprecated and will be removed in 8.0.]
206
+
207
+ Let's create an index with a `sparse_vector` mapping and index a couple
208
+ of documents into it.
191
209
192
210
[source,console]
193
211
--------------------------------------------------
194
- GET my_index/_search
212
+ PUT my_sparse_index
213
+ {
214
+ "mappings": {
215
+ "properties": {
216
+ "my_sparse_vector": {
217
+ "type": "sparse_vector"
218
+ },
219
+ "status" : {
220
+ "type" : "keyword"
221
+ }
222
+ }
223
+ }
224
+ }
225
+ --------------------------------------------------
226
+ // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
227
+
228
+ [source,console]
229
+ --------------------------------------------------
230
+ PUT my_sparse_index/_doc/1
231
+ {
232
+ "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
233
+ "status" : "published"
234
+ }
235
+
236
+ PUT my_sparse_index/_doc/2
237
+ {
238
+ "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
239
+ "status" : "published"
240
+ }
241
+
242
+ POST my_sparse_index/_refresh
243
+ --------------------------------------------------
244
+ // TEST[continued]
245
+
246
+ The `cosineSimilaritySparse` function calculates cosine similarity
247
+ between a given query vector and document vectors.
248
+
249
+ [source,console]
250
+ --------------------------------------------------
251
+ GET my_sparse_index/_search
195
252
{
196
253
"query": {
197
254
"script_score": {
@@ -205,31 +262,24 @@ GET my_index/_search
205
262
}
206
263
},
207
264
"script": {
208
- "source": "1 / (1 + l1norm( params.queryVector , doc['my_dense_vector ']))", <1>
265
+ "source": "cosineSimilaritySparse( params.query_vector , doc['my_sparse_vector ']) + 1.0",
209
266
"params": {
210
- "queryVector ": [4, 3.4, -0.2]
267
+ "query_vector ": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
211
268
}
212
269
}
213
270
}
214
271
}
215
272
}
216
273
--------------------------------------------------
274
+ // TEST[continued]
275
+ // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
217
276
218
- <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
219
- `l2norm` shown below represent distances or differences. This means, that
220
- the more similar the vectors are, the lower the scores will be that are
221
- produced by the `l1norm` and `l2norm` functions.
222
- Thus, as we need more similar vectors to score higher,
223
- we reversed the output from `l1norm` and `l2norm`. Also, to avoid
224
- division by 0 when a document vector matches the query exactly,
225
- we added `1` in the denominator.
226
-
227
- For sparse_vector fields, `l1normSparse` calculates L^1^ distance
277
+ The `dotProductSparse` function calculates dot product
228
278
between a given query vector and document vectors.
229
279
230
280
[source,console]
231
281
--------------------------------------------------
232
- GET my_index /_search
282
+ GET my_sparse_index /_search
233
283
{
234
284
"query": {
235
285
"script_score": {
@@ -243,23 +293,27 @@ GET my_index/_search
243
293
}
244
294
},
245
295
"script": {
246
- "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
247
- "params": {
248
- "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
296
+ "source": """
297
+ double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
298
+ return sigmoid(1, Math.E, -value);
299
+ """,
300
+ "params": {
301
+ "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
249
302
}
250
303
}
251
304
}
252
305
}
253
306
}
254
307
--------------------------------------------------
308
+ // TEST[continued]
309
+ // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
255
310
256
- For dense_vector fields, `l2norm` calculates L^2^ distance
257
- (Euclidean distance) between a given query vector and
258
- document vectors.
311
+ The `l1normSparse` function calculates L^1^ distance
312
+ between a given query vector and document vectors.
259
313
260
314
[source,console]
261
315
--------------------------------------------------
262
- GET my_index /_search
316
+ GET my_sparse_index /_search
263
317
{
264
318
"query": {
265
319
"script_score": {
@@ -273,22 +327,24 @@ GET my_index/_search
273
327
}
274
328
},
275
329
"script": {
276
- "source": "1 / (1 + l2norm (params.queryVector, doc['my_dense_vector ']))",
330
+ "source": "1 / (1 + l1normSparse (params.queryVector, doc['my_sparse_vector ']))",
277
331
"params": {
278
- "queryVector": [4, 3.4, -0.2]
332
+ "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
279
333
}
280
334
}
281
335
}
282
336
}
283
337
}
284
338
--------------------------------------------------
339
+ // TEST[continued]
340
+ // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
285
341
286
- Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
342
+ The `l2normSparse` function calculates L^2^ distance
287
343
between a given query vector and document vectors.
288
344
289
345
[source,console]
290
346
--------------------------------------------------
291
- GET my_index /_search
347
+ GET my_sparse_index /_search
292
348
{
293
349
"query": {
294
350
"script_score": {
@@ -311,15 +367,5 @@ GET my_index/_search
311
367
}
312
368
}
313
369
--------------------------------------------------
314
-
315
- NOTE: If a document doesn't have a value for a vector field on which
316
- a vector function is executed, an error will be thrown.
317
-
318
- You can check if a document has a value for the field `my_vector` by
319
- `doc['my_vector'].size() == 0`. Your overall script can look like this:
320
-
321
- [source,js]
322
- --------------------------------------------------
323
- "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
324
- --------------------------------------------------
325
- // NOTCONSOLE
370
+ // TEST[continued]
371
+ // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
0 commit comments