Skip to content

Commit aafdb59

Browse files
authored
SQL: Doc on syntax (identifiers in particular) (#38662)
Add section on syntax, identifiers and literals and on single vs double quotes.
1 parent b9fe312 commit aafdb59

10 files changed

+269
-16
lines changed

docs/reference/sql/appendix/syntax-reserved.asciidoc

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
Table with reserved keywords that need to be quoted. Also provide an example to make it more obvious.
77

8-
The following table lists all of the keywords that are reserved in Presto,
8+
The following table lists all of the keywords that are reserved in {es-sql},
99
along with their status in the SQL standard. These reserved keywords must
1010
be quoted (using double quotes) in order to be used as an identifier, for example:
1111

@@ -31,43 +31,65 @@ s|SQL-92
3131
|`BETWEEN` |reserved |reserved
3232
|`BY` |reserved |reserved
3333
|`CAST` |reserved |reserved
34+
|`CATALOG` |reserved |reserved
35+
|`CONVERT` |reserved |reserved
36+
|`CURRENT_DATE` |reserved |reserved
37+
|`CURRENT_TIMESTAMP` |reserved |reserved
38+
|`DAY` |reserved |reserved
39+
|`DAYS` | |
3440
|`DESC` |reserved |reserved
3541
|`DESCRIBE` |reserved |reserved
3642
|`DISTINCT` |reserved |reserved
43+
|`ESCAPE` |reserved |reserved
3744
|`EXISTS` |reserved |reserved
3845
|`EXPLAIN` |reserved |reserved
3946
|`EXTRACT` |reserved |reserved
4047
|`FALSE` |reserved |reserved
48+
|`FIRST` |reserved |reserved
4149
|`FROM` |reserved |reserved
4250
|`FULL` |reserved |reserved
4351
|`GROUP` |reserved |reserved
4452
|`HAVING` |reserved |reserved
53+
|`HOUR` |reserved |reserved
54+
|`HOURS` | |
4555
|`IN` |reserved |reserved
4656
|`INNER` |reserved |reserved
57+
|`INTERVAL` |reserved |reserved
4758
|`IS` |reserved |reserved
4859
|`JOIN` |reserved |reserved
4960
|`LEFT` |reserved |reserved
5061
|`LIKE` |reserved |reserved
5162
|`LIMIT` |reserved |reserved
5263
|`MATCH` |reserved |reserved
64+
|`MINUTE` |reserved |reserved
65+
|`MINUTES` | |
66+
|`MONTH` |reserved |reserved
5367
|`NATURAL` |reserved |reserved
54-
|`NO` |reserved |reserved
5568
|`NOT` |reserved |reserved
5669
|`NULL` |reserved |reserved
70+
|`NULLS` | |
5771
|`ON` |reserved |reserved
5872
|`OR` |reserved |reserved
5973
|`ORDER` |reserved |reserved
6074
|`OUTER` |reserved |reserved
6175
|`RIGHT` |reserved |reserved
76+
|`RLIKE` | |
77+
|`QUERY` | |
78+
|`SECOND` |reserved |reserved
79+
|`SECONDS` | |
6280
|`SELECT` |reserved |reserved
6381
|`SESSION` | |reserved
6482
|`TABLE` |reserved |reserved
83+
|`TABLES` | |
6584
|`THEN` |reserved |reserved
6685
|`TO` |reserved |reserved
6786
|`TRUE` |reserved |reserved
87+
|`TYPE` | |
6888
|`USING` |reserved |reserved
6989
|`WHEN` |reserved |reserved
7090
|`WHERE` |reserved |reserved
7191
|`WITH` |reserved |reserved
92+
|`YEAR` |reserved |reserved
93+
|`YEARS` | |
7294

7395
|===

docs/reference/sql/index.asciidoc

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,32 +12,33 @@
1212
[partintro]
1313
--
1414

15-
X-Pack includes a SQL feature to execute SQL against Elasticsearch
15+
X-Pack includes a SQL feature to execute SQL queries against {es}
1616
indices and return results in tabular format.
1717

18+
The following chapters aim to cover everything from usage, to syntax and drivers.
19+
Experience users or those in a hurry might want to jump directly to
20+
the list of SQL <<sql-commands, commands>> and <<sql-functions, functions>>.
21+
1822
<<sql-overview, Overview>>::
1923
Overview of {es-sql} and its features.
2024
<<sql-getting-started, Getting Started>>::
2125
Start using SQL right away in {es}.
2226
<<sql-concepts, Concepts and Terminology>>::
2327
Language conventions across SQL and {es}.
2428
<<sql-security,Security>>::
25-
Securing {es-sql} and {es}.
29+
Secure {es-sql} and {es}.
2630
<<sql-rest,REST API>>::
27-
Accepts SQL in a JSON document, executes it, and returns the
28-
results.
31+
Execute SQL in JSON format over REST.
2932
<<sql-translate,Translate API>>::
30-
Accepts SQL in a JSON document and translates it into a native
31-
Elasticsearch query and returns that.
33+
Translate SQL in JSON format to {es} native query.
3234
<<sql-cli,CLI>>::
33-
Command-line application that connects to {es} to execute
34-
SQL and print tabular results.
35+
Command-line application for executing SQL against {es}.
3536
<<sql-jdbc,JDBC>>::
36-
A JDBC driver for {es}.
37+
JDBC driver for {es}.
3738
<<sql-odbc,ODBC>>::
38-
An ODBC driver for {es}.
39+
ODBC driver for {es}.
3940
<<sql-client-apps,Client Applications>>::
40-
Documentation for configuring various SQL/BI tools with {es-sql}.
41+
Setup various SQL/BI tools with {es-sql}.
4142
<<sql-spec,SQL Language>>::
4243
Overview of the {es-sql} language, such as supported data types, commands and
4344
syntax.

docs/reference/sql/language/index.asciidoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,14 @@
33
[[sql-spec]]
44
== SQL Language
55

6-
This chapter describes the SQL semantics supported in X-Pack namely:
6+
This chapter describes the SQL syntax and semantics supported namely:
77

8-
<<sql-data-types>>:: Data types
8+
<<sql-lexical-structure>>:: Lexical structure
99
<<sql-commands>>:: Commands
10+
<<sql-data-types>>:: Data types
1011
<<sql-index-patterns>>:: Index patterns
1112

13+
include::syntax/lexic/index.asciidoc[]
14+
include::syntax/commands/index.asciidoc[]
1215
include::data-types.asciidoc[]
13-
include::syntax/index.asciidoc[]
1416
include::index-patterns.asciidoc[]
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
[[sql-lexical-structure]]
4+
== Lexical Structure
5+
6+
This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.
7+
8+
{es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream.
9+
10+
A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.
11+
12+
[[sql-syntax-keywords]]
13+
[float]
14+
=== Key Words
15+
16+
Take the following example:
17+
18+
[source, sql]
19+
----
20+
SELECT * FROM table
21+
----
22+
23+
This query has four tokens: `SELECT`, `\*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc...
24+
25+
As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>.
26+
Do note that key words are case-insensitive meaning the previous example can be written as:
27+
28+
[source, sql]
29+
----
30+
select * fRoM table;
31+
----
32+
33+
Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim.
34+
35+
To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.
36+
37+
[[sql-syntax-identifiers]]
38+
[float]
39+
=== Identifiers
40+
41+
Identifiers can be of two types: __quoted__ and __unquoted__:
42+
43+
[source, sql]
44+
----
45+
SELECT ip_address FROM "hosts-*"
46+
----
47+
48+
This query has two identifiers, `ip_address` and `hosts-\*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes.
49+
50+
Another example:
51+
52+
[source, sql]
53+
----
54+
SELECT "from" FROM "<logstash-{now/d}>"
55+
----
56+
57+
The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser.
58+
59+
Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.
60+
61+
[[sql-syntax-literals]]
62+
[float]
63+
=== Literals (Constants)
64+
65+
{es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers.
66+
67+
[[sql-syntax-string-literals]]
68+
[float]
69+
==== String Literals
70+
71+
A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`.
72+
To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`.
73+
74+
NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`).
75+
76+
[sql-syntax-numeric-literals]
77+
[float]
78+
==== Numeric Literals
79+
80+
Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`:
81+
82+
[source, sql]
83+
----
84+
1969 -- integer notation
85+
3.14 -- decimal notation
86+
.1234 -- decimal notation starting with decimal point
87+
4E5 -- scientific notation (with exponent marker)
88+
1.2e-3 -- scientific notation with decimal point
89+
----
90+
91+
Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types).
92+
93+
[[sql-syntax-generic-literals]]
94+
[float]
95+
==== Generic Literals
96+
97+
When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-operators-cast, cast operator>> and <<sql-functions-type-conversion, functions>>:
98+
99+
[source, sql]
100+
----
101+
123::LONG -- cast 123 to a LONG
102+
CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime
103+
CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP
104+
----
105+
106+
Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings.
107+
108+
[[sql-syntax-single-vs-double-quotes]]
109+
[float]
110+
=== Single vs Double Quotes
111+
112+
It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably.
113+
Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>.
114+
115+
To wit:
116+
117+
[source, sql]
118+
----
119+
SELECT "first_name" <1>
120+
FROM "musicians" <1>
121+
WHERE "last_name" <1>
122+
= 'Carroll' <2>
123+
----
124+
125+
<1> Double quotes `"` used for column and table identifiers
126+
<2> Single quotes `'` used for a string literal
127+
128+
[[sql-syntax-special-chars]]
129+
[float]
130+
=== Special characters
131+
132+
A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:
133+
134+
135+
[cols="^m,^15"]
136+
137+
|===
138+
139+
s|Char
140+
s|Description
141+
142+
|* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.
143+
|, | Commas are used to enumerate the elements of a list.
144+
|. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...).
145+
|()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence.
146+
|===
147+
148+
[[sql-syntax-operators]]
149+
[float]
150+
=== Operators
151+
152+
Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.
153+
154+
The following table indicates the supported operators and their precendence (highest to lowest);
155+
156+
[cols="^2m,^,^3"]
157+
158+
|===
159+
160+
s|Operator/Element
161+
s|Associativity
162+
s|Description
163+
164+
|.
165+
|left
166+
|qualifier separator
167+
168+
|::
169+
|left
170+
|PostgreSQL-style type cast
171+
172+
|+ -
173+
|right
174+
|unary plus and minus (numeric literal sign)
175+
176+
|* / %
177+
|left
178+
|multiplication, division, modulo
179+
180+
|+ -
181+
|left
182+
|addition, substraction
183+
184+
|BETWEEN IN LIKE
185+
|
186+
|range containment, string matching
187+
188+
|< > <= >= = <=> <> !=
189+
|
190+
|comparison
191+
192+
|NOT
193+
|right
194+
|logical negation
195+
196+
|AND
197+
|left
198+
|logical conjunction
199+
200+
|OR
201+
|left
202+
|logical disjunction
203+
204+
|===
205+
206+
207+
[[sql-syntax-comments]]
208+
[float]
209+
=== Comments
210+
211+
{es-sql} allows comments which are sequence of characters ignored by the parsers.
212+
213+
Two styles are supported:
214+
215+
Single Line:: Comments start with a double dash `--` and continue until the end of the line.
216+
Multi line:: Comments that start with `/\*` and end with `*/` (also known as C-style).
217+
218+
219+
[source, sql]
220+
----
221+
-- single line comment
222+
/* multi
223+
line
224+
comment
225+
that supports /* nested comments */
226+
*/
227+
----
228+

0 commit comments

Comments
 (0)