Skip to content

Commit eb48e80

Browse files
Add l1norm and l2norm distances for vectors
Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to elastic#37947
1 parent dbe8d64 commit eb48e80

File tree

7 files changed

+723
-37
lines changed

7 files changed

+723
-37
lines changed

docs/reference/query-dsl/script-score-query.asciidoc

Lines changed: 152 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ a function to be used to compute a new score for each document returned
1111
by the query. For more information on scripting see
1212
<<modules-scripting, scripting documentation>>.
1313

14-
1514
Here is an example of using `script_score` to assign each matched document
1615
a score equal to the number of likes divided by 10:
1716

@@ -31,8 +30,7 @@ GET /_search
3130
}
3231
}
3332
--------------------------------------------------
34-
// CONSOLE
35-
// TEST[setup:twitter]
33+
// NOTCONSOLE
3634

3735
NOTE: The values returned from `script_score` cannot be negative. In general,
3836
Lucene requires the scores produced by queries to be non-negative in order to
@@ -97,6 +95,40 @@ cosine similarity between a given query vector and document vectors.
9795

9896
[source,js]
9997
--------------------------------------------------
98+
PUT my_index
99+
{
100+
"mappings": {
101+
"properties": {
102+
"my_dense_vector": {
103+
"type": "dense_vector",
104+
"dims": 3
105+
},
106+
"my_sparse_vector" : {
107+
"type" : "sparse_vector"
108+
}
109+
}
110+
}
111+
}
112+
113+
PUT my_index/_doc/1
114+
{
115+
"my_dense_vector": [0.5, 10, 6],
116+
"my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1}
117+
}
118+
119+
PUT my_index/_doc/2
120+
{
121+
"my_dense_vector": [-0.5, 10, 10],
122+
"my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6}
123+
}
124+
125+
--------------------------------------------------
126+
// CONSOLE
127+
// TESTSETUP
128+
129+
[source,js]
130+
--------------------------------------------------
131+
GET my_index/_search
100132
{
101133
"query": {
102134
"script_score": {
@@ -113,7 +145,7 @@ cosine similarity between a given query vector and document vectors.
113145
}
114146
}
115147
--------------------------------------------------
116-
// NOTCONSOLE
148+
// CONSOLE
117149
<1> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
118150
<2> To take advantage of the script optimizations, provide a query vector as a script parameter.
119151

@@ -122,6 +154,7 @@ between a given query vector and document vectors.
122154

123155
[source,js]
124156
--------------------------------------------------
157+
GET my_index/_search
125158
{
126159
"query": {
127160
"script_score": {
@@ -138,13 +171,14 @@ between a given query vector and document vectors.
138171
}
139172
}
140173
--------------------------------------------------
141-
// NOTCONSOLE
174+
// CONSOLE
142175

143176
For dense_vector fields, `dotProduct` calculates the measure of
144177
dot product between a given query vector and document vectors.
145178

146179
[source,js]
147180
--------------------------------------------------
181+
GET my_index/_search
148182
{
149183
"query": {
150184
"script_score": {
@@ -153,7 +187,7 @@ dot product between a given query vector and document vectors.
153187
},
154188
"script": {
155189
"source": """
156-
double value = dotProduct(params.query_vector, doc['my_vector']);
190+
double value = dotProduct(params.query_vector, doc['my_dense_vector']);
157191
return sigmoid(1, Math.E, -value); <1>
158192
""",
159193
"params": {
@@ -164,7 +198,7 @@ dot product between a given query vector and document vectors.
164198
}
165199
}
166200
--------------------------------------------------
167-
// NOTCONSOLE
201+
// CONSOLE
168202

169203
<1> Using the standard sigmoid function prevents scores from being negative.
170204

@@ -173,6 +207,7 @@ between a given query vector and document vectors.
173207

174208
[source,js]
175209
--------------------------------------------------
210+
GET my_index/_search
176211
{
177212
"query": {
178213
"script_score": {
@@ -192,8 +227,118 @@ between a given query vector and document vectors.
192227
}
193228
}
194229
--------------------------------------------------
230+
// CONSOLE
231+
232+
For dense_vector fields, `l1norm` calculates L^1^ distance
233+
(Manhattan distance) between a given query vector and
234+
document vectors.
235+
236+
[source,js]
237+
--------------------------------------------------
238+
GET my_index/_search
239+
{
240+
"query": {
241+
"script_score": {
242+
"query": {
243+
"match_all": {}
244+
},
245+
"script": {
246+
"source": "l1norm(params.queryVector, doc['my_dense_vector'])",
247+
"params": {
248+
"queryVector": [4, 3.4, -0.2]
249+
}
250+
}
251+
}
252+
}
253+
}
254+
--------------------------------------------------
255+
// CONSOLE
256+
257+
NOTE: Unlike `cosineSimilarity` that represent similarity, `l1norm` and
258+
`l2norm` shown below represent distances or differences. This means, that
259+
the more similar are the vectors, the less will be the scores produced by the
260+
`l1norm` and `l2norm` functions. Thus, if you need more similar vectors to
261+
score higher, you should reverse the output from `l1norm` and `l2norm`:
262+
263+
[source,js]
264+
--------------------------------------------------
265+
"source": " 1/ l1norm(params.queryVector, doc['my_dense_vector'])"
266+
--------------------------------------------------
195267
// NOTCONSOLE
196268

269+
For sparse_vector fields, `l1normSparse` calculates L^1^ distance
270+
between a given query vector and document vectors.
271+
272+
[source,js]
273+
--------------------------------------------------
274+
GET my_index/_search
275+
{
276+
"query": {
277+
"script_score": {
278+
"query": {
279+
"match_all": {}
280+
},
281+
"script": {
282+
"source": "l1normSparse(params.queryVector, doc['my_sparse_vector'])",
283+
"params": {
284+
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
285+
}
286+
}
287+
}
288+
}
289+
}
290+
--------------------------------------------------
291+
// CONSOLE
292+
293+
For dense_vector fields, `l2norm` calculates L^2^ distance
294+
(Euclidean distance) between a given query vector and
295+
document vectors.
296+
297+
[source,js]
298+
--------------------------------------------------
299+
GET my_index/_search
300+
{
301+
"query": {
302+
"script_score": {
303+
"query": {
304+
"match_all": {}
305+
},
306+
"script": {
307+
"source": "l2norm(params.queryVector, doc['my_dense_vector'])",
308+
"params": {
309+
"queryVector": [4, 3.4, -0.2]
310+
}
311+
}
312+
}
313+
}
314+
}
315+
--------------------------------------------------
316+
// CONSOLE
317+
318+
Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
319+
between a given query vector and document vectors.
320+
321+
[source,js]
322+
--------------------------------------------------
323+
GET my_index/_search
324+
{
325+
"query": {
326+
"script_score": {
327+
"query": {
328+
"match_all": {}
329+
},
330+
"script": {
331+
"source": "l2normSparse(params.queryVector, doc['my_sparse_vector'])",
332+
"params": {
333+
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
334+
}
335+
}
336+
}
337+
}
338+
}
339+
--------------------------------------------------
340+
// CONSOLE
341+
197342
NOTE: If a document doesn't have a value for a vector field on which
198343
a vector function is executed, an error will be thrown.
199344

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
setup:
2+
- skip:
3+
features: headers
4+
version: " - 7.3.99"
5+
reason: "l1norm and l2norm functions were added from 7.4"
6+
7+
- do:
8+
indices.create:
9+
include_type_name: false
10+
index: test-index
11+
body:
12+
settings:
13+
number_of_replicas: 0
14+
mappings:
15+
properties:
16+
my_dense_vector:
17+
type: dense_vector
18+
dims: 5
19+
- do:
20+
index:
21+
index: test-index
22+
id: 1
23+
body:
24+
my_dense_vector: [230.0, 300.33, -34.8988, 15.555, -200.0]
25+
26+
- do:
27+
index:
28+
index: test-index
29+
id: 2
30+
body:
31+
my_dense_vector: [-0.5, 100.0, -13, 14.8, -156.0]
32+
33+
- do:
34+
index:
35+
index: test-index
36+
id: 3
37+
body:
38+
my_dense_vector: [0.5, 111.3, -13.0, 14.8, -156.0]
39+
40+
- do:
41+
indices.refresh: {}
42+
43+
44+
---
45+
"L1 norm":
46+
- do:
47+
headers:
48+
Content-Type: application/json
49+
search:
50+
rest_total_hits_as_int: true
51+
body:
52+
query:
53+
script_score:
54+
query: {match_all: {} }
55+
script:
56+
source: "l1norm(params.query_vector, doc['my_dense_vector'])"
57+
params:
58+
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]
59+
60+
- match: {hits.total: 3}
61+
62+
- match: {hits.hits.0._id: "1"}
63+
- gte: {hits.hits.0._score: 485.18}
64+
- lte: {hits.hits.0._score: 485.19}
65+
66+
- match: {hits.hits.1._id: "2"}
67+
- gte: {hits.hits.1._score: 12.25}
68+
- lte: {hits.hits.1._score: 12.35}
69+
70+
- match: {hits.hits.2._id: "3"}
71+
- gte: {hits.hits.2._score: 0.00}
72+
- lte: {hits.hits.2._score: 0.01}
73+
74+
---
75+
"L2 norm":
76+
- do:
77+
headers:
78+
Content-Type: application/json
79+
search:
80+
rest_total_hits_as_int: true
81+
body:
82+
query:
83+
script_score:
84+
query: {match_all: {} }
85+
script:
86+
source: "l2norm(params.query_vector, doc['my_dense_vector'])"
87+
params:
88+
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]
89+
90+
- match: {hits.total: 3}
91+
92+
- match: {hits.hits.0._id: "1"}
93+
- gte: {hits.hits.0._score: 301.36}
94+
- lte: {hits.hits.0._score: 301.37}
95+
96+
- match: {hits.hits.1._id: "2"}
97+
- gte: {hits.hits.1._score: 11.34}
98+
- lte: {hits.hits.1._score: 11.35}
99+
100+
- match: {hits.hits.2._id: "3"}
101+
- gte: {hits.hits.2._score: 0.00}
102+
- lte: {hits.hits.2._score: 0.01}

0 commit comments

Comments
 (0)