Skip to content

Expose proximity boosting (#39385) #40251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions docs/reference/query-dsl/distance-feature-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
[[query-dsl-distance-feature-query]]
=== Distance Feature Query

The `distance_feature` query is a specialized query that only works
on <<date, `date`>>, <<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>>
fields. Its goal is to boost documents' scores based on proximity
to some given origin. For example, use this query if you want to
give more weight to documents with dates closer to a certain date,
or to documents with locations closer to a certain location.

This query is called `distance_feature` query, because it dynamically
calculates distances between the given origin and documents' field values,
and use these distances as features to boost the documents' scores.

`distance_feature` query is typically used on its own to find the nearest
neighbors to a given point, or put in a `should` clause of a
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.

Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is not set to `true`.

==== Syntax of distance_feature query

`distance_feature` query has the following syntax:
[source,js]
--------------------------------------------------
"distance_feature": {
"field": <field>,
"origin": <origin>,
"pivot": <pivot>,
"boost" : <boost>
}
--------------------------------------------------
// NOTCONSOLE

[horizontal]
`field`::
Required parameter. Defines the name of the field on which to calculate
distances. Must be a field of the type `date`, `date_nanos` or `geo_point`,
and must be indexed (`"index": true`, which is the default) and has
<<doc-values, doc values>> (`"doc_values": true`, which is the default).

`origin`::
Required parameter. Defines a point of origin used for calculating
distances. Must be a date for date and date_nanos fields,
and a geo-point for geo_point fields. Date math (for example `now-1h`) is
supported for a date origin.

`pivot`::
Required parameter. Defines the distance from origin at which the computed
score will equal to a half of the `boost` parameter. Must be
a `number+date unit` ("1h", "10d",...) for date and date_nanos fields,
and a `number + geo unit` ("1km", "12m",...) for geo fields.

`boost`::
Optional parameter with a default value of `1`. Defines the factor by which
to multiply the score. Must be a non-negative float number.


The `distance_feature` query computes a document's score as following:

`score = boost * pivot / (pivot + distance)`

where `distance` is the absolute difference between the origin and
a document's field value.

==== Example using distance_feature query

Let's look at an example. We index several documents containing
information about sales items, such as name, production date,
and location.

[source,js]
--------------------------------------------------
PUT items
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"production_date": {
"type": "date"
},
"location": {
"type": "geo_point"
}
}
}
}

PUT items/_doc/1
{
"name" : "chocolate",
"production_date": "2018-02-01",
"location": [-71.34, 41.12]
}

PUT items/_doc/2
{
"name" : "chocolate",
"production_date": "2018-01-01",
"location": [-71.3, 41.15]
}


PUT items/_doc/3
{
"name" : "chocolate",
"production_date": "2017-12-01",
"location": [-71.3, 41.12]
}

POST items/_refresh
--------------------------------------------------
// CONSOLE

We look for all chocolate items, but we also want chocolates
that are produced recently (closer to the date `now`)
to be ranked higher.

[source,js]
--------------------------------------------------
GET items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "production_date",
"pivot": "7d",
"origin": "now"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

We can look for all chocolate items, but we also want chocolates
that are produced locally (closer to our geo origin)
come first in the result list.

[source,js]
--------------------------------------------------
GET items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "location",
"pivot": "1000m",
"origin": [-71.3, 41.15]
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
8 changes: 8 additions & 0 deletions docs/reference/query-dsl/special-queries.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ the specified document.
A query that computes scores based on the values of numeric features and is
able to efficiently skip non-competitive hits.

<<query-dsl-distance-feature-query,`distance_feature` query>>::

A query that computes scores based on the dynamically computed distances
between the origin and documents' date, date_nanos and geo_point fields.
It is able to efficiently skip non-competitive hits.

<<query-dsl-wrapper-query,`wrapper` query>>::

A query that accepts other queries as json or yaml string.
Expand All @@ -42,4 +48,6 @@ include::percolate-query.asciidoc[]

include::rank-feature-query.asciidoc[]

include::distance-feature-query.asciidoc[]

include::wrapper-query.asciidoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ public enum ValueType {
OBJECT_OR_LONG(START_OBJECT, VALUE_NUMBER),
OBJECT_ARRAY_BOOLEAN_OR_STRING(START_OBJECT, START_ARRAY, VALUE_BOOLEAN, VALUE_STRING),
OBJECT_ARRAY_OR_STRING(START_OBJECT, START_ARRAY, VALUE_STRING),
OBJECT_ARRAY_STRING_OR_NUMBER(START_OBJECT, START_ARRAY, VALUE_STRING, VALUE_NUMBER),
VALUE(VALUE_BOOLEAN, VALUE_NULL, VALUE_EMBEDDED_OBJECT, VALUE_NUMBER, VALUE_STRING),
VALUE_OBJECT_ARRAY(VALUE_BOOLEAN, VALUE_NULL, VALUE_EMBEDDED_OBJECT, VALUE_NUMBER, VALUE_STRING, START_OBJECT, START_ARRAY),
VALUE_ARRAY(VALUE_BOOLEAN, VALUE_NULL, VALUE_NUMBER, VALUE_STRING, START_ARRAY);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
setup:
- skip:
version: " - 7.0.99"
reason: "Implemented in 7.1"

- do:
indices.create:
index: index1
body:
settings:
number_of_replicas: 0
mappings:
properties:
my_date:
type: date
my_date_nanos:
type: date_nanos
my_geo:
type: geo_point

- do:
bulk:
refresh: true
body:
- '{ "index" : { "_index" : "index1", "_id" : "1" } }'
- '{ "my_date": "2018-02-01T10:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.223456789Z", "my_geo": [-71.34, 41.13] }'
- '{ "index" : { "_index" : "index1", "_id" : "2" } }'
- '{ "my_date": "2018-02-01T11:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.123456789Z", "my_geo": [-71.34, 41.14] }'
- '{ "index" : { "_index" : "index1", "_id" : "3" } }'
- '{ "my_date": "2018-02-01T09:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.323456789Z", "my_geo": [-71.34, 41.12] }'

---
"test distance_feature query on date type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_date
pivot: 1h
origin: 2018-02-01T08:00:30Z

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }

---
"test distance_feature query on date_nanos type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_date_nanos
pivot: 100000000nanos
origin: 2018-02-01T00:00:00.323456789Z

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }

---
"test distance_feature query on geo_point type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_geo
pivot: 1km
origin: [-71.35, 41.12]

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }
21 changes: 21 additions & 0 deletions server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,27 @@ private static GeoPoint parseGeoHash(GeoPoint point, String geohash, EffectivePo
}
}

/**
* Parse a {@link GeoPoint} from a string. The string must have one of the following forms:
*
* <ul>
* <li>Latitude, Longitude form: <pre>&quot;<i>&lt;latitude&gt;</i>,<i>&lt;longitude&gt;</i>&quot;</pre></li>
* <li>Geohash form:: <pre>&quot;<i>&lt;geohash&gt;</i>&quot;</pre></li>
* </ul>
*
* @param val a String to parse the value from
* @return new parsed {@link GeoPoint}
*/
public static GeoPoint parseFromString(String val) {
GeoPoint point = new GeoPoint();
boolean ignoreZValue = false;
if (val.contains(",")) {
return point.resetFromString(val, ignoreZValue);
} else {
return parseGeoHash(point, val, EffectivePoint.BOTTOM_LEFT);
}
}

/**
* Parse a precision that can be expressed as an integer or a distance measure like "1km", "10m".
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,10 @@ public DateFormatter dateTimeFormatter() {
return dateTimeFormatter;
}

public Resolution resolution() {
return resolution;
}

void setDateTimeFormatter(DateFormatter formatter) {
checkIfFrozen();
this.dateTimeFormatter = formatter;
Expand Down
Loading