Skip to content

Commit 70c36a2

Browse files
committed
Add Student's t-test aggregation support
Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to elastic#53692
1 parent 04c39ae commit 70c36a2

26 files changed

+2447
-5
lines changed

docs/build.gradle

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -539,6 +539,41 @@ for (int i = 0; i < 100; i++) {
539539
{"load_time": "$value"}"""
540540
}
541541

542+
// Used by t_test aggregations
543+
buildRestTests.setups['node_upgrade'] = '''
544+
- do:
545+
indices.create:
546+
index: node_upgrade
547+
body:
548+
settings:
549+
number_of_shards: 1
550+
number_of_replicas: 1
551+
mappings:
552+
properties:
553+
name:
554+
type: keyword
555+
startup_time_before:
556+
type: long
557+
startup_time_after:
558+
type: long
559+
- do:
560+
bulk:
561+
index: node_upgrade
562+
refresh: true
563+
body: |
564+
{"index":{}}
565+
{"name": "A", "startup_time_before": 102, "startup_time_after": 89}
566+
{"index":{}}
567+
{"name": "B", "startup_time_before": 99, "startup_time_after": 93}
568+
{"index":{}}
569+
{"name": "C", "startup_time_before": 111, "startup_time_after": 72}
570+
{"index":{}}
571+
{"name": "D", "startup_time_before": 97, "startup_time_after": 98}
572+
{"index":{}}
573+
{"name": "E", "startup_time_before": 101, "startup_time_after": 102}
574+
{"index":{}}
575+
{"name": "F", "startup_time_before": 99, "startup_time_after": 98}'''
576+
542577
// Used by iprange agg
543578
buildRestTests.setups['iprange'] = '''
544579
- do:

docs/reference/aggregations/metrics.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ include::metrics/median-absolute-deviation-aggregation.asciidoc[]
4949

5050
include::metrics/boxplot-aggregation.asciidoc[]
5151

52-
52+
include::metrics/t-test-aggregation.asciidoc[]
5353

5454

5555

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
[[search-aggregations-metrics-ttest-aggregation]]
4+
=== TTest Aggregation
5+
6+
A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
7+
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts.
8+
9+
==== Syntax
10+
11+
A `t_test` aggregation looks like this in isolation:
12+
13+
[source,js]
14+
--------------------------------------------------
15+
{
16+
"t_test": {
17+
"a": "value_before",
18+
"b": "value_after",
19+
"type": "paired"
20+
}
21+
}
22+
--------------------------------------------------
23+
// NOTCONSOLE
24+
25+
Assuming that we have a record of node start up times before
26+
and after upgrade, let's look at a ttest to see if upgrade affected
27+
the node start up time in a meaningful way.
28+
29+
[source,console]
30+
--------------------------------------------------
31+
GET node_upgrade/_search
32+
{
33+
"size": 0,
34+
"aggs" : {
35+
"startup_time_ttest" : {
36+
"t_test" : {
37+
"a" : {"field": "startup_time_before" } <1>,
38+
"b" : {"field": "startup_time_after"} <2>,
39+
"type": "paired"
40+
}
41+
}
42+
}
43+
}
44+
--------------------------------------------------
45+
// TEST[setup:node_upgrade]
46+
<1> The field `startup_time_before` must be a numeric field
47+
<b> The field `startup_time_after` must be a numeric field
48+
<1> The field `startup_time_before` since we have data from the same nodes, we are using paired t-test.
49+
50+
The response will look like this:
51+
52+
[source,console-result]
53+
--------------------------------------------------
54+
{
55+
...
56+
57+
"aggregations": {
58+
"startup_time_ttest": {
59+
"value": 0.1914368843365979
60+
}
61+
}
62+
}
63+
--------------------------------------------------
64+
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
65+
66+
==== T-Test Types
67+
68+
The `t_test` aggregation supports unpaired and paired two-sample t-tests. The type of the test can be specified using the `type` parameter:
69+
70+
`"type": "paired"`:: performs paired t-test
71+
`"type": "homoscedastic"`:: performs two-sample equal variance test
72+
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default)
73+
74+
==== Script
75+
76+
The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use
77+
a script to recalculate them on-the-fly:
78+
79+
[source,console]
80+
--------------------------------------------------
81+
GET node_upgrade/_search
82+
{
83+
"size": 0,
84+
"aggs" : {
85+
"startup_time_ttest" : {
86+
"t_test" : {
87+
"a": {
88+
"script" : {
89+
"lang": "painless",
90+
"source": "doc['startup_time_before'].value - params.adjustment", <1>
91+
"params" : {
92+
"adjustment" : 10 <2>
93+
}
94+
}
95+
},
96+
"b": {
97+
"field": "startup_time_after" <3>
98+
},
99+
"type": "paired"
100+
}
101+
}
102+
}
103+
}
104+
--------------------------------------------------
105+
// TEST[setup:node_upgrade]
106+
107+
<1> The `field` parameter is replaced with a `script` parameter, which uses the
108+
script to generate values which percentiles are calculated on
109+
<2> Scripting supports parameterized input just like any other script
110+
<3> We can mix scripts and fields
111+

x-pack/plugin/analytics/build.gradle

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ dependencies {
1818

1919
compileOnly project(path: xpackModule('core'), configuration: 'default')
2020
testCompile project(path: xpackModule('core'), configuration: 'testArtifacts')
21+
22+
compile 'org.apache.commons:commons-math3:3.2'
2123
}
2224

2325
integTest.enabled = false
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ec2544ab27e110d2d431bdad7d538ed509b21e62

0 commit comments

Comments
 (0)