Skip to content

Commit 51c6f69

Browse files
authored
[7.x] Add support for filters to T-Test aggregation (#54980) (#55066)
Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692
1 parent a2fafa6 commit 51c6f69

File tree

14 files changed

+510
-83
lines changed

14 files changed

+510
-83
lines changed

docs/build.gradle

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ buildRestTests.setups['node_upgrade'] = '''
548548
number_of_replicas: 1
549549
mappings:
550550
properties:
551-
name:
551+
group:
552552
type: keyword
553553
startup_time_before:
554554
type: long
@@ -560,17 +560,17 @@ buildRestTests.setups['node_upgrade'] = '''
560560
refresh: true
561561
body: |
562562
{"index":{}}
563-
{"name": "A", "startup_time_before": 102, "startup_time_after": 89}
563+
{"group": "A", "startup_time_before": 102, "startup_time_after": 89}
564564
{"index":{}}
565-
{"name": "B", "startup_time_before": 99, "startup_time_after": 93}
565+
{"group": "A", "startup_time_before": 99, "startup_time_after": 93}
566566
{"index":{}}
567-
{"name": "C", "startup_time_before": 111, "startup_time_after": 72}
567+
{"group": "A", "startup_time_before": 111, "startup_time_after": 72}
568568
{"index":{}}
569-
{"name": "D", "startup_time_before": 97, "startup_time_after": 98}
569+
{"group": "B", "startup_time_before": 97, "startup_time_after": 98}
570570
{"index":{}}
571-
{"name": "E", "startup_time_before": 101, "startup_time_after": 102}
571+
{"group": "B", "startup_time_before": 101, "startup_time_after": 102}
572572
{"index":{}}
573-
{"name": "F", "startup_time_before": 99, "startup_time_after": 98}'''
573+
{"group": "B", "startup_time_before": 99, "startup_time_after": 98}'''
574574

575575
// Used by iprange agg
576576
buildRestTests.setups['iprange'] = '''

docs/reference/aggregations/metrics/t-test-aggregation.asciidoc

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[role="xpack"]
22
[testenv="basic"]
33
[[search-aggregations-metrics-ttest-aggregation]]
4-
=== TTest Aggregation
4+
=== T-Test Aggregation
55

66
A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
77
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. In practice, this
@@ -43,8 +43,8 @@ GET node_upgrade/_search
4343
}
4444
--------------------------------------------------
4545
// TEST[setup:node_upgrade]
46-
<1> The field `startup_time_before` must be a numeric field
47-
<2> The field `startup_time_after` must be a numeric field
46+
<1> The field `startup_time_before` must be a numeric field.
47+
<2> The field `startup_time_after` must be a numeric field.
4848
<3> Since we have data from the same nodes, we are using paired t-test.
4949

5050
The response will return the p-value or probability value for the test. It is the probability of obtaining results at least as extreme as
@@ -74,6 +74,69 @@ The `t_test` aggregation supports unpaired and paired two-sample t-tests. The ty
7474
`"type": "homoscedastic"`:: performs two-sample equal variance test
7575
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default)
7676

77+
==== Filters
78+
79+
It is also possible to run unpaired t-test on different sets of records using filters. For example, if we want to test the difference
80+
of startup times before upgrade between two different groups of nodes, we use the same field `startup_time_before` by separate groups of
81+
nodes using terms filters on the group name field:
82+
83+
[source,console]
84+
--------------------------------------------------
85+
GET node_upgrade/_search
86+
{
87+
"size" : 0,
88+
"aggs" : {
89+
"startup_time_ttest" : {
90+
"t_test" : {
91+
"a" : {
92+
"field" : "startup_time_before", <1>
93+
"filter" : {
94+
"term" : {
95+
"group" : "A" <2>
96+
}
97+
}
98+
},
99+
"b" : {
100+
"field" : "startup_time_before", <3>
101+
"filter" : {
102+
"term" : {
103+
"group" : "B" <4>
104+
}
105+
}
106+
},
107+
"type" : "heteroscedastic" <5>
108+
}
109+
}
110+
}
111+
}
112+
--------------------------------------------------
113+
// TEST[setup:node_upgrade]
114+
<1> The field `startup_time_before` must be a numeric field.
115+
<2> Any query that separates two groups can be used here.
116+
<3> We are using the same field
117+
<4> but we are using different filters.
118+
<5> Since we have data from different nodes, we cannot use paired t-test.
119+
120+
121+
[source,console-result]
122+
--------------------------------------------------
123+
{
124+
...
125+
126+
"aggregations": {
127+
"startup_time_ttest": {
128+
"value": 0.2981858007281437 <1>
129+
}
130+
}
131+
}
132+
--------------------------------------------------
133+
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
134+
<1> The p-value.
135+
136+
In this example, we are using the same fields for both populations. However this is not a requirement and different fields and even
137+
combination of fields and scripts can be used. Populations don't have to be in the same index either. If data sets are located in different
138+
indices, the term filter on the <<mapping-index-field,`_index`>> field can be used to select populations.
139+
77140
==== Script
78141

79142
The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use
@@ -108,7 +171,7 @@ GET node_upgrade/_search
108171
// TEST[setup:node_upgrade]
109172

110173
<1> The `field` parameter is replaced with a `script` parameter, which uses the
111-
script to generate values which percentiles are calculated on
112-
<2> Scripting supports parameterized input just like any other script
113-
<3> We can mix scripts and fields
174+
script to generate values which percentiles are calculated on.
175+
<2> Scripting supports parameterized input just like any other script.
176+
<3> We can mix scripts and fields.
114177

server/src/main/java/org/elasticsearch/search/aggregations/metrics/WeightedAvgAggregationBuilder.java

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.common.xcontent.ObjectParser;
2626
import org.elasticsearch.common.xcontent.ToXContent;
2727
import org.elasticsearch.common.xcontent.XContentBuilder;
28+
import org.elasticsearch.index.query.QueryBuilder;
2829
import org.elasticsearch.index.query.QueryShardContext;
2930
import org.elasticsearch.search.DocValueFormat;
3031
import org.elasticsearch.search.aggregations.AggregationBuilder;
@@ -51,8 +52,8 @@ public class WeightedAvgAggregationBuilder extends MultiValuesSourceAggregationB
5152
ObjectParser.fromBuilder(NAME, WeightedAvgAggregationBuilder::new);
5253
static {
5354
MultiValuesSourceParseHelper.declareCommon(PARSER, true, ValueType.NUMERIC);
54-
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false);
55-
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false);
55+
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false, false);
56+
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false, false);
5657
}
5758

5859
public WeightedAvgAggregationBuilder(String name) {
@@ -99,10 +100,11 @@ public BucketCardinality bucketCardinality() {
99100

100101
@Override
101102
protected MultiValuesSourceAggregatorFactory<Numeric> innerBuild(QueryShardContext queryShardContext,
102-
Map<String, ValuesSourceConfig<Numeric>> configs,
103-
DocValueFormat format,
104-
AggregatorFactory parent,
105-
Builder subFactoriesBuilder) throws IOException {
103+
Map<String, ValuesSourceConfig<Numeric>> configs,
104+
Map<String, QueryBuilder> filters,
105+
DocValueFormat format,
106+
AggregatorFactory parent,
107+
Builder subFactoriesBuilder) throws IOException {
106108
return new WeightedAvgAggregatorFactory(name, configs, format, queryShardContext, parent, subFactoriesBuilder, metadata);
107109
}
108110

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceAggregationBuilder.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.elasticsearch.common.io.stream.StreamInput;
2323
import org.elasticsearch.common.io.stream.StreamOutput;
2424
import org.elasticsearch.common.xcontent.XContentBuilder;
25+
import org.elasticsearch.index.query.QueryBuilder;
2526
import org.elasticsearch.index.query.QueryShardContext;
2627
import org.elasticsearch.search.DocValueFormat;
2728
import org.elasticsearch.search.aggregations.AbstractAggregationBuilder;
@@ -168,13 +169,15 @@ protected final MultiValuesSourceAggregatorFactory<VS> doBuild(QueryShardContext
168169
ValueType finalValueType = this.valueType != null ? this.valueType : targetValueType;
169170

170171
Map<String, ValuesSourceConfig<VS>> configs = new HashMap<>(fields.size());
172+
Map<String, QueryBuilder> filters = new HashMap<>(fields.size());
171173
fields.forEach((key, value) -> {
172174
ValuesSourceConfig<VS> config = ValuesSourceConfig.resolve(queryShardContext, finalValueType,
173175
value.getFieldName(), value.getScript(), value.getMissing(), value.getTimeZone(), format);
174176
configs.put(key, config);
177+
filters.put(key, value.getFilter());
175178
});
176179
DocValueFormat docValueFormat = resolveFormat(format, finalValueType);
177-
return innerBuild(queryShardContext, configs, docValueFormat, parent, subFactoriesBuilder);
180+
return innerBuild(queryShardContext, configs, filters, docValueFormat, parent, subFactoriesBuilder);
178181
}
179182

180183

@@ -191,6 +194,7 @@ private static DocValueFormat resolveFormat(@Nullable String format, @Nullable V
191194

192195
protected abstract MultiValuesSourceAggregatorFactory<VS> innerBuild(QueryShardContext queryShardContext,
193196
Map<String, ValuesSourceConfig<VS>> configs,
197+
Map<String, QueryBuilder> filters,
194198
DocValueFormat format, AggregatorFactory parent,
195199
Builder subFactoriesBuilder) throws IOException;
196200

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceFieldConfig.java

Lines changed: 46 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,26 +30,30 @@
3030
import org.elasticsearch.common.xcontent.ToXContentObject;
3131
import org.elasticsearch.common.xcontent.XContentBuilder;
3232
import org.elasticsearch.common.xcontent.XContentParser;
33+
import org.elasticsearch.index.query.AbstractQueryBuilder;
34+
import org.elasticsearch.index.query.QueryBuilder;
3335
import org.elasticsearch.script.Script;
3436

3537
import java.io.IOException;
3638
import java.time.ZoneId;
3739
import java.time.ZoneOffset;
3840
import java.util.Objects;
39-
import java.util.function.BiFunction;
4041

4142
public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject {
42-
private String fieldName;
43-
private Object missing;
44-
private Script script;
45-
private ZoneId timeZone;
43+
private final String fieldName;
44+
private final Object missing;
45+
private final Script script;
46+
private final ZoneId timeZone;
47+
private final QueryBuilder filter;
4648

4749
private static final String NAME = "field_config";
4850

49-
public static final BiFunction<Boolean, Boolean, ObjectParser<MultiValuesSourceFieldConfig.Builder, Void>> PARSER
50-
= (scriptable, timezoneAware) -> {
51+
public static final ParseField FILTER = new ParseField("filter");
5152

52-
ObjectParser<MultiValuesSourceFieldConfig.Builder, Void> parser
53+
public static <C> ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parserBuilder(boolean scriptable, boolean timezoneAware,
54+
boolean filtered) {
55+
56+
ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parser
5357
= new ObjectParser<>(MultiValuesSourceFieldConfig.NAME, MultiValuesSourceFieldConfig.Builder::new);
5458

5559
parser.declareString(MultiValuesSourceFieldConfig.Builder::setFieldName, ParseField.CommonFields.FIELD);
@@ -71,14 +75,21 @@ public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject
7175
}
7276
}, ParseField.CommonFields.TIME_ZONE, ObjectParser.ValueType.LONG);
7377
}
78+
79+
if (filtered) {
80+
parser.declareField(MultiValuesSourceFieldConfig.Builder::setFilter,
81+
(p, context) -> AbstractQueryBuilder.parseInnerQueryBuilder(p),
82+
FILTER, ObjectParser.ValueType.OBJECT);
83+
}
7484
return parser;
7585
};
7686

77-
private MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone) {
87+
protected MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone, QueryBuilder filter) {
7888
this.fieldName = fieldName;
7989
this.missing = missing;
8090
this.script = script;
8191
this.timeZone = timeZone;
92+
this.filter = filter;
8293
}
8394

8495
public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
@@ -94,6 +105,11 @@ public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
94105
} else {
95106
this.timeZone = in.readOptionalZoneId();
96107
}
108+
if (in.getVersion().onOrAfter(Version.V_7_8_0)) {
109+
this.filter = in.readOptionalNamedWriteable(QueryBuilder.class);
110+
} else {
111+
this.filter = null;
112+
}
97113
}
98114

99115
public Object getMissing() {
@@ -112,6 +128,10 @@ public String getFieldName() {
112128
return fieldName;
113129
}
114130

131+
public QueryBuilder getFilter() {
132+
return filter;
133+
}
134+
115135
@Override
116136
public void writeTo(StreamOutput out) throws IOException {
117137
if (out.getVersion().onOrAfter(Version.V_7_6_0)) {
@@ -126,6 +146,9 @@ public void writeTo(StreamOutput out) throws IOException {
126146
} else {
127147
out.writeOptionalZoneId(timeZone);
128148
}
149+
if (out.getVersion().onOrAfter(Version.V_7_8_0)) {
150+
out.writeOptionalNamedWriteable(filter);
151+
}
129152
}
130153

131154
@Override
@@ -143,6 +166,10 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
143166
if (timeZone != null) {
144167
builder.field(ParseField.CommonFields.TIME_ZONE.getPreferredName(), timeZone.getId());
145168
}
169+
if (filter != null) {
170+
builder.field(FILTER.getPreferredName());
171+
filter.toXContent(builder, params);
172+
}
146173
builder.endObject();
147174
return builder;
148175
}
@@ -155,12 +182,13 @@ public boolean equals(Object o) {
155182
return Objects.equals(fieldName, that.fieldName)
156183
&& Objects.equals(missing, that.missing)
157184
&& Objects.equals(script, that.script)
158-
&& Objects.equals(timeZone, that.timeZone);
185+
&& Objects.equals(timeZone, that.timeZone)
186+
&& Objects.equals(filter, that.filter);
159187
}
160188

161189
@Override
162190
public int hashCode() {
163-
return Objects.hash(fieldName, missing, script, timeZone);
191+
return Objects.hash(fieldName, missing, script, timeZone, filter);
164192
}
165193

166194
@Override
@@ -173,6 +201,7 @@ public static class Builder {
173201
private Object missing = null;
174202
private Script script = null;
175203
private ZoneId timeZone = null;
204+
private QueryBuilder filter = null;
176205

177206
public String getFieldName() {
178207
return fieldName;
@@ -210,6 +239,11 @@ public Builder setTimeZone(ZoneId timeZone) {
210239
return this;
211240
}
212241

242+
public Builder setFilter(QueryBuilder filter) {
243+
this.filter = filter;
244+
return this;
245+
}
246+
213247
public MultiValuesSourceFieldConfig build() {
214248
if (Strings.isNullOrEmpty(fieldName) && script == null) {
215249
throw new IllegalArgumentException("[" + ParseField.CommonFields.FIELD.getPreferredName()
@@ -223,7 +257,7 @@ public MultiValuesSourceFieldConfig build() {
223257
"Please specify one or the other.");
224258
}
225259

226-
return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone);
260+
return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone, filter);
227261
}
228262
}
229263
}

server/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSourceParseHelper.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ public static <VS extends ValuesSource, T> void declareCommon(
5050

5151
public static <VS extends ValuesSource, T> void declareField(String fieldName,
5252
AbstractObjectParser<? extends MultiValuesSourceAggregationBuilder<VS, ?>, T> objectParser,
53-
boolean scriptable, boolean timezoneAware) {
53+
boolean scriptable, boolean timezoneAware, boolean filterable) {
5454

5555
objectParser.declareField((o, fieldConfig) -> o.field(fieldName, fieldConfig.build()),
56-
(p, c) -> MultiValuesSourceFieldConfig.PARSER.apply(scriptable, timezoneAware).parse(p, null),
56+
(p, c) -> MultiValuesSourceFieldConfig.parserBuilder(scriptable, timezoneAware, filterable).parse(p, null),
5757
new ParseField(fieldName), ObjectParser.ValueType.OBJECT);
5858
}
5959
}

0 commit comments

Comments
 (0)