Skip to content

Commit 0b794d4

Browse files
committed
SQL: Implement FIRST/LAST aggregate functions (#37936)
FIRST and LAST can be used with one argument and work similarly to MIN and MAX but they are implemented using a Top Hits aggregation and therefore can also operate on keyword fields. When a second argument is provided then they return the first/last value of the first arg when its values are ordered ascending/descending (respectively) by the values of the second argument. Currently because of the usage of a Top Hits aggregation FIRST and LAST cannot be used in the HAVING clause of a GROUP BY query to filter on the results of the aggregation. Closes: #35639
1 parent 93ac858 commit 0b794d4

File tree

34 files changed

+1196
-98
lines changed

34 files changed

+1196
-98
lines changed

docs/reference/sql/functions/aggs.asciidoc

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,196 @@ Returns the total number of _distinct non-null_ values in input values.
113113
include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct]
114114
--------------------------------------------------
115115

116+
[[sql-functions-aggs-first]]
117+
===== `FIRST/FIRST_VALUE`
118+
119+
.Synopsis:
120+
[source, sql]
121+
----------------------------------------------
122+
FIRST(field_name<1>[, ordering_field_name]<2>)
123+
----------------------------------------------
124+
125+
*Input*:
126+
127+
<1> target field for the aggregation
128+
<2> optional field used for ordering
129+
130+
*Output*: same type as the input
131+
132+
.Description:
133+
134+
Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by
135+
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name`
136+
column is used for the sorting. E.g.:
137+
138+
[cols="<,<"]
139+
|===
140+
s| a | b
141+
142+
| 100 | 1
143+
| 200 | 1
144+
| 1 | 2
145+
| 2 | 2
146+
| 10 | null
147+
| 20 | null
148+
| null | null
149+
|===
150+
151+
[source, sql]
152+
----------------------
153+
SELECT FIRST(a) FROM t
154+
----------------------
155+
156+
will result in:
157+
[cols="<"]
158+
|===
159+
s| FIRST(a)
160+
| 1
161+
|===
162+
163+
and
164+
165+
[source, sql]
166+
-------------------------
167+
SELECT FIRST(a, b) FROM t
168+
-------------------------
169+
170+
will result in:
171+
[cols="<"]
172+
|===
173+
s| FIRST(a, b)
174+
| 100
175+
|===
176+
177+
178+
["source","sql",subs="attributes,macros"]
179+
-----------------------------------------------------------
180+
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg]
181+
-----------------------------------------------------------
182+
183+
["source","sql",subs="attributes,macros"]
184+
--------------------------------------------------------------------
185+
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArgAndGroupBy]
186+
--------------------------------------------------------------------
187+
188+
["source","sql",subs="attributes,macros"]
189+
-----------------------------------------------------------
190+
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs]
191+
-----------------------------------------------------------
192+
193+
["source","sql",subs="attributes,macros"]
194+
---------------------------------------------------------------------
195+
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgsAndGroupBy]
196+
---------------------------------------------------------------------
197+
198+
`FIRST_VALUE` is a name alias and can be used instead of `FIRST`, e.g.:
199+
200+
["source","sql",subs="attributes,macros"]
201+
--------------------------------------------------------------------------
202+
include-tagged::{sql-specs}/docs.csv-spec[firstValueWithTwoArgsAndGroupBy]
203+
--------------------------------------------------------------------------
204+
205+
[NOTE]
206+
`FIRST` cannot be used in a HAVING clause.
207+
[NOTE]
208+
`FIRST` cannot be used with columns of type <<text, `text`>> unless
209+
the field is also <<before-enabling-fielddata,saved as a keyword>>.
210+
211+
[[sql-functions-aggs-last]]
212+
===== `LAST/LAST_VALUE`
213+
214+
.Synopsis:
215+
[source, sql]
216+
--------------------------------------------------
217+
LAST(field_name<1>[, ordering_field_name]<2>)
218+
--------------------------------------------------
219+
220+
*Input*:
221+
222+
<1> target field for the aggregation
223+
<2> optional field used for ordering
224+
225+
*Output*: same type as the input
226+
227+
.Description:
228+
229+
It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the
230+
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not
231+
provided, only the `field_name` column is used for the sorting. E.g.:
232+
233+
[cols="<,<"]
234+
|===
235+
s| a | b
236+
237+
| 10 | 1
238+
| 20 | 1
239+
| 1 | 2
240+
| 2 | 2
241+
| 100 | null
242+
| 200 | null
243+
| null | null
244+
|===
245+
246+
[source, sql]
247+
------------------------
248+
SELECT LAST(a) FROM t
249+
------------------------
250+
251+
will result in:
252+
[cols="<"]
253+
|===
254+
s| LAST(a)
255+
| 200
256+
|===
257+
258+
and
259+
260+
[source, sql]
261+
------------------------
262+
SELECT LAST(a, b) FROM t
263+
------------------------
264+
265+
will result in:
266+
[cols="<"]
267+
|===
268+
s| LAST(a, b)
269+
| 2
270+
|===
271+
272+
273+
["source","sql",subs="attributes,macros"]
274+
-----------------------------------------------------------
275+
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg]
276+
-----------------------------------------------------------
277+
278+
["source","sql",subs="attributes,macros"]
279+
-------------------------------------------------------------------
280+
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArgAndGroupBy]
281+
-------------------------------------------------------------------
282+
283+
["source","sql",subs="attributes,macros"]
284+
-----------------------------------------------------------
285+
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs]
286+
-----------------------------------------------------------
287+
288+
["source","sql",subs="attributes,macros"]
289+
--------------------------------------------------------------------
290+
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgsAndGroupBy]
291+
--------------------------------------------------------------------
292+
293+
`LAST_VALUE` is a name alias and can be used instead of `LAST`, e.g.:
294+
295+
["source","sql",subs="attributes,macros"]
296+
-------------------------------------------------------------------------
297+
include-tagged::{sql-specs}/docs.csv-spec[lastValueWithTwoArgsAndGroupBy]
298+
-------------------------------------------------------------------------
299+
300+
[NOTE]
301+
`LAST` cannot be used in `HAVING` clause.
302+
[NOTE]
303+
`LAST` cannot be used with columns of type <<text, `text`>> unless
304+
the field is also <<before-enabling-fielddata,`saved as a keyword`>>.
305+
116306
[[sql-functions-aggs-max]]
117307
===== `MAX`
118308

@@ -137,6 +327,10 @@ Returns the maximum value across input values in the field `field_name`.
137327
include-tagged::{sql-specs}/docs.csv-spec[aggMax]
138328
--------------------------------------------------
139329

330+
[NOTE]
331+
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
332+
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.
333+
140334
[[sql-functions-aggs-min]]
141335
===== `MIN`
142336

@@ -161,6 +355,10 @@ Returns the minimum value across input values in the field `field_name`.
161355
include-tagged::{sql-specs}/docs.csv-spec[aggMin]
162356
--------------------------------------------------
163357

358+
[NOTE]
359+
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
360+
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause.
361+
164362
[[sql-functions-aggs-sum]]
165363
===== `SUM`
166364

docs/reference/sql/limitations.asciidoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,10 @@ include-tagged::{sql-specs}/docs.csv-spec[limitationSubSelectRewritten]
9090

9191
But, if the sub-select would include a `GROUP BY` or `HAVING` or the enclosing `SELECT` would be more complex than `SELECT X
9292
FROM (SELECT ...) WHERE [simple_condition]`, this is currently **un-supported**.
93+
94+
[float]
95+
=== Use <<sql-functions-aggs-first, `FIRST`>>/<<sql-functions-aggs-last,`LAST`>> aggregation functions in `HAVING` clause
96+
97+
Using `FIRST` and `LAST` in the `HAVING` clause is not supported. The same applies to
98+
<<sql-functions-aggs-min,`MIN`>> and <<sql-functions-aggs-max,`MAX`>> when their target column
99+
is of type <<keyword, `keyword`>> as they are internally translated to `FIRST` and `LAST`.

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CcrRepositoryIT.java

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,6 @@ public void testThatRepositoryRecoversEmptyIndexBasedOnLeaderSettings() throws I
169169
assertNotEquals(leaderMetadata.getIndexUUID(), followerMetadata.getIndexUUID());
170170
}
171171

172-
@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/38100")
173172
public void testDocsAreRecovered() throws Exception {
174173
String leaderClusterRepoName = CcrRepository.NAME_PREFIX + "leader_cluster";
175174
String leaderIndex = "index1";
@@ -316,7 +315,6 @@ public void testRateLimitingIsEmployed() throws Exception {
316315
}
317316
}
318317

319-
@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/38027")
320318
public void testIndividualActionsTimeout() throws Exception {
321319
ClusterUpdateSettingsRequest settingsRequest = new ClusterUpdateSettingsRequest();
322320
TimeValue timeValue = TimeValue.timeValueMillis(100);

x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/cli/ShowTestCase.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ public void testShowFunctions() throws IOException {
3131
assertThat(readLine(), containsString(HEADER_SEPARATOR));
3232
assertThat(readLine(), RegexMatcher.matches("\\s*AVG\\s*\\|\\s*AGGREGATE\\s*"));
3333
assertThat(readLine(), RegexMatcher.matches("\\s*COUNT\\s*\\|\\s*AGGREGATE\\s*"));
34+
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST\\s*\\|\\s*AGGREGATE\\s*"));
35+
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
36+
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
37+
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
3438
assertThat(readLine(), RegexMatcher.matches("\\s*MAX\\s*\\|\\s*AGGREGATE\\s*"));
3539
assertThat(readLine(), RegexMatcher.matches("\\s*MIN\\s*\\|\\s*AGGREGATE\\s*"));
3640
String line = readLine();
@@ -58,6 +62,8 @@ public void testShowFunctions() throws IOException {
5862
public void testShowFunctionsLikePrefix() throws IOException {
5963
assertThat(command("SHOW FUNCTIONS LIKE 'L%'"), RegexMatcher.matches("\\s*name\\s*\\|\\s*type\\s*"));
6064
assertThat(readLine(), containsString(HEADER_SEPARATOR));
65+
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
66+
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
6167
assertThat(readLine(), RegexMatcher.matches("\\s*LEAST\\s*\\|\\s*CONDITIONAL\\s*"));
6268
assertThat(readLine(), RegexMatcher.matches("\\s*LOG\\s*\\|\\s*SCALAR\\s*"));
6369
assertThat(readLine(), RegexMatcher.matches("\\s*LOG10\\s*\\|\\s*SCALAR\\s*"));

x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,3 +373,76 @@ SELECT COUNT(ALL last_name)=COUNT(ALL first_name) AS areEqual, COUNT(ALL first_n
373373
---------------+---------------+---------------
374374
false |90 |100
375375
;
376+
377+
topHitsWithOneArgAndGroupBy
378+
schema::gender:s|first:s|last:s
379+
SELECT gender, FIRST(first_name) as first, LAST(first_name) as last FROM test_emp GROUP BY gender ORDER BY gender;
380+
381+
gender | first | last
382+
---------------+---------------+---------------
383+
null | Berni | Patricio
384+
F | Alejandro | Xinglin
385+
M | Amabile | Zvonko
386+
;
387+
388+
topHitsWithTwoArgsAndGroupBy
389+
schema::gender:s|first:s|last:s
390+
SELECT gender, FIRST(first_name, birth_date) as first, LAST(first_name, birth_date) as last FROM test_emp GROUP BY gender ORDER BY gender;
391+
392+
gender | first | last
393+
---------------+---------------+---------------
394+
null | Lillian | Eberhardt
395+
F | Sumant | Valdiodio
396+
M | Remzi | Hilari
397+
;
398+
399+
topHitsWithTwoArgsAndGroupByWithNullsOnTargetField
400+
schema::gender:s|first:s|last:s
401+
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10025 AND 10035 GROUP BY gender ORDER BY gender;
402+
403+
gender | first | last
404+
---------------+---------------+---------------
405+
F | null | Divier
406+
M | null | Domenick
407+
;
408+
409+
topHitsWithTwoArgsAndGroupByWithNullsOnSortingField
410+
schema::gender:s|first:s|last:s
411+
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10047 AND 10052 GROUP BY gender ORDER BY gender;
412+
413+
gender | first | last
414+
---------------+---------------+---------------
415+
F | Basil | Basil
416+
M | Hidefumi | Heping
417+
;
418+
419+
topHitsWithTwoArgsAndGroupByWithNullsOnTargetAndSortingField
420+
schema::gender:s|first:s|last:s
421+
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10037 AND 10052 GROUP BY gender ORDER BY gender;
422+
423+
gender | first | last
424+
---------------+-------------+-----------------
425+
F | Basil | Weiyi
426+
M | Hidefumi | null
427+
;
428+
429+
topHitsWithTwoArgsAndGroupByWithAllNullsOnTargetField
430+
schema::gender:s|first:s|last:s
431+
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10030 AND 10037 GROUP BY gender ORDER BY gender;
432+
433+
gender | first | last
434+
---------------+---------------+---------------
435+
F | null | null
436+
M | null | null
437+
;
438+
439+
topHitsOnDatetime
440+
schema::gender:s|first:i|last:i
441+
SELECT gender, month(first(birth_date, languages)) first, month(last(birth_date, languages)) last FROM test_emp GROUP BY gender ORDER BY gender;
442+
443+
gender | first | last
444+
---------------+---------------+---------------
445+
null | 1 | 10
446+
F | 4 | 6
447+
M | 1 | 4
448+
;

x-pack/plugin/sql/qa/src/main/resources/command.csv-spec

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,12 @@ SHOW FUNCTIONS;
88

99
name:s | type:s
1010
AVG |AGGREGATE
11-
COUNT |AGGREGATE
12-
MAX |AGGREGATE
11+
COUNT |AGGREGATE
12+
FIRST |AGGREGATE
13+
FIRST_VALUE |AGGREGATE
14+
LAST |AGGREGATE
15+
LAST_VALUE |AGGREGATE
16+
MAX |AGGREGATE
1317
MIN |AGGREGATE
1418
SUM |AGGREGATE
1519
KURTOSIS |AGGREGATE

0 commit comments

Comments
 (0)