Skip to content

Commit e259689

Browse files
committed
[Docs] clarification about cardinality accuracy (#34616)
Adds a bit more clarification about how accuracy is dependent on the dataset in question. Closes #18231
1 parent 6a2dff2 commit e259689

File tree

1 file changed

+12
-4
lines changed

1 file changed

+12
-4
lines changed

docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -150,10 +150,18 @@ public static void main(String[] args) {
150150

151151
image:images/cardinality_error.png[]
152152

153-
For all 3 thresholds, counts have been accurate up to the configured threshold
154-
(although not guaranteed, this is likely to be the case). Please also note that
155-
even with a threshold as low as 100, the error remains very low, even when
156-
counting millions of items.
153+
For all 3 thresholds, counts have been accurate up to the configured threshold.
154+
Although not guaranteed, this is likely to be the case. Accuracy in practice depends
155+
on the dataset in question. In general, most datasets show consistently good
156+
accuracy. Also note that even with a threshold as low as 100, the error
157+
remains very low (1-6% as seen in the above graph) even when counting millions of items.
158+
159+
The HyperLogLog++ algorithm depends on the leading zeros of hashed
160+
values, the exact distributions of hashes in a dataset can affect the
161+
accuracy of the cardinality.
162+
163+
Please also note that even with a threshold as low as 100, the error remains
164+
very low, even when counting millions of items.
157165

158166
==== Pre-computed hashes
159167

0 commit comments

Comments
 (0)