Skip to content

Commit 6d28596

Browse files
authored
Add support for filters to T-Test aggregation (#54980)
Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692
1 parent f7809dd commit 6d28596

File tree

15 files changed

+509
-81
lines changed

15 files changed

+509
-81
lines changed

docs/build.gradle

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -552,7 +552,7 @@ buildRestTests.setups['node_upgrade'] = '''
552552
number_of_replicas: 1
553553
mappings:
554554
properties:
555-
name:
555+
group:
556556
type: keyword
557557
startup_time_before:
558558
type: long
@@ -564,17 +564,17 @@ buildRestTests.setups['node_upgrade'] = '''
564564
refresh: true
565565
body: |
566566
{"index":{}}
567-
{"name": "A", "startup_time_before": 102, "startup_time_after": 89}
567+
{"group": "A", "startup_time_before": 102, "startup_time_after": 89}
568568
{"index":{}}
569-
{"name": "B", "startup_time_before": 99, "startup_time_after": 93}
569+
{"group": "A", "startup_time_before": 99, "startup_time_after": 93}
570570
{"index":{}}
571-
{"name": "C", "startup_time_before": 111, "startup_time_after": 72}
571+
{"group": "A", "startup_time_before": 111, "startup_time_after": 72}
572572
{"index":{}}
573-
{"name": "D", "startup_time_before": 97, "startup_time_after": 98}
573+
{"group": "B", "startup_time_before": 97, "startup_time_after": 98}
574574
{"index":{}}
575-
{"name": "E", "startup_time_before": 101, "startup_time_after": 102}
575+
{"group": "B", "startup_time_before": 101, "startup_time_after": 102}
576576
{"index":{}}
577-
{"name": "F", "startup_time_before": 99, "startup_time_after": 98}'''
577+
{"group": "B", "startup_time_before": 99, "startup_time_after": 98}'''
578578

579579
// Used by iprange agg
580580
buildRestTests.setups['iprange'] = '''

docs/reference/aggregations/metrics/t-test-aggregation.asciidoc

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[role="xpack"]
22
[testenv="basic"]
33
[[search-aggregations-metrics-ttest-aggregation]]
4-
=== TTest Aggregation
4+
=== T-Test Aggregation
55

66
A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
77
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. In practice, this
@@ -43,8 +43,8 @@ GET node_upgrade/_search
4343
}
4444
--------------------------------------------------
4545
// TEST[setup:node_upgrade]
46-
<1> The field `startup_time_before` must be a numeric field
47-
<2> The field `startup_time_after` must be a numeric field
46+
<1> The field `startup_time_before` must be a numeric field.
47+
<2> The field `startup_time_after` must be a numeric field.
4848
<3> Since we have data from the same nodes, we are using paired t-test.
4949

5050
The response will return the p-value or probability value for the test. It is the probability of obtaining results at least as extreme as
@@ -74,6 +74,69 @@ The `t_test` aggregation supports unpaired and paired two-sample t-tests. The ty
7474
`"type": "homoscedastic"`:: performs two-sample equal variance test
7575
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default)
7676

77+
==== Filters
78+
79+
It is also possible to run unpaired t-test on different sets of records using filters. For example, if we want to test the difference
80+
of startup times before upgrade between two different groups of nodes, we use the same field `startup_time_before` by separate groups of
81+
nodes using terms filters on the group name field:
82+
83+
[source,console]
84+
--------------------------------------------------
85+
GET node_upgrade/_search
86+
{
87+
"size" : 0,
88+
"aggs" : {
89+
"startup_time_ttest" : {
90+
"t_test" : {
91+
"a" : {
92+
"field" : "startup_time_before", <1>
93+
"filter" : {
94+
"term" : {
95+
"group" : "A" <2>
96+
}
97+
}
98+
},
99+
"b" : {
100+
"field" : "startup_time_before", <3>
101+
"filter" : {
102+
"term" : {
103+
"group" : "B" <4>
104+
}
105+
}
106+
},
107+
"type" : "heteroscedastic" <5>
108+
}
109+
}
110+
}
111+
}
112+
--------------------------------------------------
113+
// TEST[setup:node_upgrade]
114+
<1> The field `startup_time_before` must be a numeric field.
115+
<2> Any query that separates two groups can be used here.
116+
<3> We are using the same field
117+
<4> but we are using different filters.
118+
<5> Since we have data from different nodes, we cannot use paired t-test.
119+
120+
121+
[source,console-result]
122+
--------------------------------------------------
123+
{
124+
...
125+
126+
"aggregations": {
127+
"startup_time_ttest": {
128+
"value": 0.2981858007281437 <1>
129+
}
130+
}
131+
}
132+
--------------------------------------------------
133+
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
134+
<1> The p-value.
135+
136+
In this example, we are using the same fields for both populations. However this is not a requirement and different fields and even
137+
combination of fields and scripts can be used. Populations don't have to be in the same index either. If data sets are located in different
138+
indices, the term filter on the <<mapping-index-field,`_index`>> field can be used to select populations.
139+
77140
==== Script
78141

79142
The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use
@@ -108,7 +171,7 @@ GET node_upgrade/_search
108171
// TEST[setup:node_upgrade]
109172

110173
<1> The `field` parameter is replaced with a `script` parameter, which uses the
111-
script to generate values which percentiles are calculated on
112-
<2> Scripting supports parameterized input just like any other script
113-
<3> We can mix scripts and fields
174+
script to generate values which percentiles are calculated on.
175+
<2> Scripting supports parameterized input just like any other script.
176+
<3> We can mix scripts and fields.
114177

server/src/main/java/org/elasticsearch/search/aggregations/metrics/WeightedAvgAggregationBuilder.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.common.xcontent.ObjectParser;
2626
import org.elasticsearch.common.xcontent.ToXContent;
2727
import org.elasticsearch.common.xcontent.XContentBuilder;
28+
import org.elasticsearch.index.query.QueryBuilder;
2829
import org.elasticsearch.index.query.QueryShardContext;
2930
import org.elasticsearch.search.DocValueFormat;
3031
import org.elasticsearch.search.aggregations.AggregationBuilder;
@@ -52,8 +53,8 @@ public class WeightedAvgAggregationBuilder extends MultiValuesSourceAggregationB
5253
ObjectParser.fromBuilder(NAME, WeightedAvgAggregationBuilder::new);
5354
static {
5455
MultiValuesSourceParseHelper.declareCommon(PARSER, true, ValueType.NUMERIC);
55-
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false);
56-
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false);
56+
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false, false);
57+
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false, false);
5758
}
5859

5960
public WeightedAvgAggregationBuilder(String name) {
@@ -106,6 +107,7 @@ public BucketCardinality bucketCardinality() {
106107
@Override
107108
protected MultiValuesSourceAggregatorFactory innerBuild(QueryShardContext queryShardContext,
108109
Map<String, ValuesSourceConfig> configs,
110+
Map<String, QueryBuilder> filters,
109111
DocValueFormat format,
110112
AggregatorFactory parent,
111113
Builder subFactoriesBuilder) throws IOException {

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceAggregationBuilder.java

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.elasticsearch.common.io.stream.StreamInput;
2323
import org.elasticsearch.common.io.stream.StreamOutput;
2424
import org.elasticsearch.common.xcontent.XContentBuilder;
25+
import org.elasticsearch.index.query.QueryBuilder;
2526
import org.elasticsearch.index.query.QueryShardContext;
2627
import org.elasticsearch.search.DocValueFormat;
2728
import org.elasticsearch.search.aggregations.AbstractAggregationBuilder;
@@ -169,13 +170,16 @@ public AB format(String format) {
169170
protected final MultiValuesSourceAggregatorFactory doBuild(QueryShardContext queryShardContext, AggregatorFactory parent,
170171
Builder subFactoriesBuilder) throws IOException {
171172
Map<String, ValuesSourceConfig> configs = new HashMap<>(fields.size());
173+
Map<String, QueryBuilder> filters = new HashMap<>(fields.size());
172174
fields.forEach((key, value) -> {
173175
ValuesSourceConfig config = ValuesSourceConfig.resolveUnregistered(queryShardContext, userValueTypeHint,
174176
value.getFieldName(), value.getScript(), value.getMissing(), value.getTimeZone(), format, defaultValueSourceType());
175177
configs.put(key, config);
178+
filters.put(key, value.getFilter());
176179
});
177180
DocValueFormat docValueFormat = resolveFormat(format, userValueTypeHint, defaultValueSourceType());
178-
return innerBuild(queryShardContext, configs, docValueFormat, parent, subFactoriesBuilder);
181+
182+
return innerBuild(queryShardContext, configs, filters, docValueFormat, parent, subFactoriesBuilder);
179183
}
180184

181185

@@ -194,6 +198,7 @@ private static DocValueFormat resolveFormat(@Nullable String format, @Nullable V
194198

195199
protected abstract MultiValuesSourceAggregatorFactory innerBuild(QueryShardContext queryShardContext,
196200
Map<String, ValuesSourceConfig> configs,
201+
Map<String, QueryBuilder> filters,
197202
DocValueFormat format, AggregatorFactory parent,
198203
Builder subFactoriesBuilder) throws IOException;
199204

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceFieldConfig.java

Lines changed: 46 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -29,26 +29,30 @@
2929
import org.elasticsearch.common.xcontent.ToXContentObject;
3030
import org.elasticsearch.common.xcontent.XContentBuilder;
3131
import org.elasticsearch.common.xcontent.XContentParser;
32+
import org.elasticsearch.index.query.AbstractQueryBuilder;
33+
import org.elasticsearch.index.query.QueryBuilder;
3234
import org.elasticsearch.script.Script;
3335

3436
import java.io.IOException;
3537
import java.time.ZoneId;
3638
import java.time.ZoneOffset;
3739
import java.util.Objects;
38-
import java.util.function.BiFunction;
3940

4041
public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject {
41-
private String fieldName;
42-
private Object missing;
43-
private Script script;
44-
private ZoneId timeZone;
42+
private final String fieldName;
43+
private final Object missing;
44+
private final Script script;
45+
private final ZoneId timeZone;
46+
private final QueryBuilder filter;
4547

4648
private static final String NAME = "field_config";
4749

48-
public static final BiFunction<Boolean, Boolean, ObjectParser<MultiValuesSourceFieldConfig.Builder, Void>> PARSER
49-
= (scriptable, timezoneAware) -> {
50+
public static final ParseField FILTER = new ParseField("filter");
5051

51-
ObjectParser<MultiValuesSourceFieldConfig.Builder, Void> parser
52+
public static <C> ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parserBuilder(boolean scriptable, boolean timezoneAware,
53+
boolean filtered) {
54+
55+
ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parser
5256
= new ObjectParser<>(MultiValuesSourceFieldConfig.NAME, MultiValuesSourceFieldConfig.Builder::new);
5357

5458
parser.declareString(MultiValuesSourceFieldConfig.Builder::setFieldName, ParseField.CommonFields.FIELD);
@@ -70,14 +74,21 @@ public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject
7074
}
7175
}, ParseField.CommonFields.TIME_ZONE, ObjectParser.ValueType.LONG);
7276
}
77+
78+
if (filtered) {
79+
parser.declareField(MultiValuesSourceFieldConfig.Builder::setFilter,
80+
(p, context) -> AbstractQueryBuilder.parseInnerQueryBuilder(p),
81+
FILTER, ObjectParser.ValueType.OBJECT);
82+
}
7383
return parser;
7484
};
7585

76-
private MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone) {
86+
protected MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone, QueryBuilder filter) {
7787
this.fieldName = fieldName;
7888
this.missing = missing;
7989
this.script = script;
8090
this.timeZone = timeZone;
91+
this.filter = filter;
8192
}
8293

8394
public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
@@ -89,6 +100,11 @@ public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
89100
this.missing = in.readGenericValue();
90101
this.script = in.readOptionalWriteable(Script::new);
91102
this.timeZone = in.readOptionalZoneId();
103+
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { //Change to Version.V_7_8_0 after backporting
104+
this.filter = in.readOptionalNamedWriteable(QueryBuilder.class);
105+
} else {
106+
this.filter = null;
107+
}
92108
}
93109

94110
public Object getMissing() {
@@ -107,6 +123,10 @@ public String getFieldName() {
107123
return fieldName;
108124
}
109125

126+
public QueryBuilder getFilter() {
127+
return filter;
128+
}
129+
110130
@Override
111131
public void writeTo(StreamOutput out) throws IOException {
112132
if (out.getVersion().onOrAfter(Version.V_7_6_0)) {
@@ -117,6 +137,9 @@ public void writeTo(StreamOutput out) throws IOException {
117137
out.writeGenericValue(missing);
118138
out.writeOptionalWriteable(script);
119139
out.writeOptionalZoneId(timeZone);
140+
if (out.getVersion().onOrAfter(Version.V_8_0_0)) { //Change to Version.V_7_8_0 after backporting
141+
out.writeOptionalNamedWriteable(filter);
142+
}
120143
}
121144

122145
@Override
@@ -134,6 +157,10 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
134157
if (timeZone != null) {
135158
builder.field(ParseField.CommonFields.TIME_ZONE.getPreferredName(), timeZone.getId());
136159
}
160+
if (filter != null) {
161+
builder.field(FILTER.getPreferredName());
162+
filter.toXContent(builder, params);
163+
}
137164
builder.endObject();
138165
return builder;
139166
}
@@ -146,12 +173,13 @@ public boolean equals(Object o) {
146173
return Objects.equals(fieldName, that.fieldName)
147174
&& Objects.equals(missing, that.missing)
148175
&& Objects.equals(script, that.script)
149-
&& Objects.equals(timeZone, that.timeZone);
176+
&& Objects.equals(timeZone, that.timeZone)
177+
&& Objects.equals(filter, that.filter);
150178
}
151179

152180
@Override
153181
public int hashCode() {
154-
return Objects.hash(fieldName, missing, script, timeZone);
182+
return Objects.hash(fieldName, missing, script, timeZone, filter);
155183
}
156184

157185
@Override
@@ -164,6 +192,7 @@ public static class Builder {
164192
private Object missing = null;
165193
private Script script = null;
166194
private ZoneId timeZone = null;
195+
private QueryBuilder filter = null;
167196

168197
public String getFieldName() {
169198
return fieldName;
@@ -201,6 +230,11 @@ public Builder setTimeZone(ZoneId timeZone) {
201230
return this;
202231
}
203232

233+
public Builder setFilter(QueryBuilder filter) {
234+
this.filter = filter;
235+
return this;
236+
}
237+
204238
public MultiValuesSourceFieldConfig build() {
205239
if (Strings.isNullOrEmpty(fieldName) && script == null) {
206240
throw new IllegalArgumentException("[" + ParseField.CommonFields.FIELD.getPreferredName()
@@ -214,7 +248,7 @@ public MultiValuesSourceFieldConfig build() {
214248
"Please specify one or the other.");
215249
}
216250

217-
return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone);
251+
return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone, filter);
218252
}
219253
}
220254
}

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceParseHelper.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ public static <T> void declareCommon(
5050

5151
public static <VS extends ValuesSource, T> void declareField(String fieldName,
5252
AbstractObjectParser<? extends MultiValuesSourceAggregationBuilder<?>, T> objectParser,
53-
boolean scriptable, boolean timezoneAware) {
53+
boolean scriptable, boolean timezoneAware, boolean filterable) {
5454

5555
objectParser.declareField((o, fieldConfig) -> o.field(fieldName, fieldConfig.build()),
56-
(p, c) -> MultiValuesSourceFieldConfig.PARSER.apply(scriptable, timezoneAware).parse(p, null),
56+
(p, c) -> MultiValuesSourceFieldConfig.parserBuilder(scriptable, timezoneAware, filterable).parse(p, null),
5757
new ParseField(fieldName), ObjectParser.ValueType.OBJECT);
5858
}
5959
}

0 commit comments

Comments
 (0)