Skip to content

Commit f1e944a

Browse files
committed
docs: describe parent/child performances
1 parent 8bf3324 commit f1e944a

File tree

3 files changed

+27
-60
lines changed

3 files changed

+27
-60
lines changed

docs/reference/mapping/types/parent-join.asciidoc

+12-60
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,17 @@ PUT my_index/doc/4?routing=1&refresh
114114
<2> `answer` is the name of the join for this document
115115
<3> The parent id of this child document
116116

117+
==== Parent-join and performance.
118+
119+
The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
120+
is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
121+
significant tax to your query performance.
122+
123+
The only case where the join field makes sense is if your data contains a one-to-many relationship where
124+
one entity significantly outnumbers the other entity. An example of such case is a use case with products
125+
and offers for these products. In the case that offers significantly outnumbers the number of products then
126+
it makes sense to model the product as parent document and the offer as child document.
127+
117128
==== Parent-join restrictions
118129

119130
* Only one `join` field mapping is allowed per index.
@@ -338,7 +349,7 @@ GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question
338349
// CONSOLE
339350
// TEST[continued]
340351

341-
==== Multiple levels of parent join
352+
==== Multiple children per parent
342353

343354
It is also possible to define multiple children for a single parent:
344355

@@ -363,62 +374,3 @@ PUT my_index
363374
// CONSOLE
364375

365376
<1> `question` is parent of `answer` and `comment`.
366-
367-
And multiple levels of parent/child:
368-
369-
[source,js]
370-
--------------------------------------------------
371-
PUT my_index
372-
{
373-
"mappings": {
374-
"doc": {
375-
"properties": {
376-
"my_join_field": {
377-
"type": "join",
378-
"relations": {
379-
"question": ["answer", "comment"], <1>
380-
"answer": "vote" <2>
381-
}
382-
}
383-
}
384-
}
385-
}
386-
}
387-
--------------------------------------------------
388-
// CONSOLE
389-
390-
<1> `question` is parent of `answer` and `comment`
391-
<2> `answer` is parent of `vote`
392-
393-
The mapping above represents the following tree:
394-
395-
question
396-
/ \
397-
/ \
398-
comment answer
399-
|
400-
|
401-
vote
402-
403-
Indexing a grand child document requires a `routing` value equals
404-
to the grand-parent (the greater parent of the lineage):
405-
406-
407-
[source,js]
408-
--------------------------------------------------
409-
PUT my_index/doc/3?routing=1&refresh <1>
410-
{
411-
"text": "This is a vote",
412-
"my_join_field": {
413-
"name": "vote",
414-
"parent": "2" <2>
415-
}
416-
}
417-
--------------------------------------------------
418-
// CONSOLE
419-
// TEST[continued]
420-
421-
<1> This child document must be on the same shard than its grandparent and parent
422-
<2> The parent id of this document (must points to an `answer` document)
423-
424-

docs/reference/query-dsl/has-child-query.asciidoc

+8
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@ GET /_search
2323
--------------------------------------------------
2424
// CONSOLE
2525

26+
Note that the `has_child` is a slow query compared to other queries in the
27+
query dsl due to the fact that it performs a join. The performance degrades
28+
as the number of matching child documents pointing to unique parent documents
29+
increases. If you care about query performance you should not use this query.
30+
However if you do happen to use this query then use it as less as possible. Each
31+
`has_child` query that gets added to a search request can increase query time
32+
significantly.
33+
2634
[float]
2735
==== Scoring capabilities
2836

docs/reference/query-dsl/has-parent-query.asciidoc

+7
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,13 @@ GET /_search
2525
--------------------------------------------------
2626
// CONSOLE
2727

28+
Note that the `has_parent` is a slow query compared to other queries in the
29+
query dsl due to the fact that it performs a join. The performance degrades
30+
as the number of matching parent documents increases. If you care about query
31+
performance you should not use this query. However if you do happen to use
32+
this query then use it as less as possible. Each `has_parent` query that gets
33+
added to a search request can increase query time significantly.
34+
2835
[float]
2936
==== Scoring capabilities
3037

0 commit comments

Comments
 (0)