|
| 1 | +[role="xpack"] |
| 2 | +[testenv="basic"] |
| 3 | +[[sql-lexical-structure]] |
| 4 | +== Lexical Structure |
| 5 | + |
| 6 | +This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth. |
| 7 | + |
| 8 | +{es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream. |
| 9 | + |
| 10 | +A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided. |
| 11 | + |
| 12 | +[[sql-syntax-keywords]] |
| 13 | +[float] |
| 14 | +=== Key Words |
| 15 | + |
| 16 | +Take the following example: |
| 17 | + |
| 18 | +[source, sql] |
| 19 | +---- |
| 20 | +SELECT * FROM table |
| 21 | +---- |
| 22 | + |
| 23 | +This query has four tokens: `SELECT`, `\*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc... |
| 24 | + |
| 25 | +As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>. |
| 26 | +Do note that key words are case-insensitive meaning the previous example can be written as: |
| 27 | + |
| 28 | +[source, sql] |
| 29 | +---- |
| 30 | +select * fRoM table; |
| 31 | +---- |
| 32 | + |
| 33 | +Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim. |
| 34 | + |
| 35 | +To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others. |
| 36 | + |
| 37 | +[[sql-syntax-identifiers]] |
| 38 | +[float] |
| 39 | +=== Identifiers |
| 40 | + |
| 41 | +Identifiers can be of two types: __quoted__ and __unquoted__: |
| 42 | + |
| 43 | +[source, sql] |
| 44 | +---- |
| 45 | +SELECT ip_address FROM "hosts-*" |
| 46 | +---- |
| 47 | + |
| 48 | +This query has two identifiers, `ip_address` and `hosts-\*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes. |
| 49 | + |
| 50 | +Another example: |
| 51 | + |
| 52 | +[source, sql] |
| 53 | +---- |
| 54 | +SELECT "from" FROM "<logstash-{now/d}>" |
| 55 | +---- |
| 56 | + |
| 57 | +The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser. |
| 58 | + |
| 59 | +Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation. |
| 60 | + |
| 61 | +[[sql-syntax-literals]] |
| 62 | +[float] |
| 63 | +=== Literals (Constants) |
| 64 | + |
| 65 | +{es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers. |
| 66 | + |
| 67 | +[[sql-syntax-string-literals]] |
| 68 | +[float] |
| 69 | +==== String Literals |
| 70 | + |
| 71 | +A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`. |
| 72 | +To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`. |
| 73 | + |
| 74 | +NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`). |
| 75 | + |
| 76 | +[sql-syntax-numeric-literals] |
| 77 | +[float] |
| 78 | +==== Numeric Literals |
| 79 | + |
| 80 | +Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`: |
| 81 | + |
| 82 | +[source, sql] |
| 83 | +---- |
| 84 | +1969 -- integer notation |
| 85 | +3.14 -- decimal notation |
| 86 | +.1234 -- decimal notation starting with decimal point |
| 87 | +4E5 -- scientific notation (with exponent marker) |
| 88 | +1.2e-3 -- scientific notation with decimal point |
| 89 | +---- |
| 90 | + |
| 91 | +Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types). |
| 92 | + |
| 93 | +[[sql-syntax-generic-literals]] |
| 94 | +[float] |
| 95 | +==== Generic Literals |
| 96 | + |
| 97 | +When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-functions-type-conversion, functions>>: |
| 98 | + |
| 99 | +[source, sql] |
| 100 | +---- |
| 101 | +CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime |
| 102 | +CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP |
| 103 | +---- |
| 104 | + |
| 105 | +Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings. |
| 106 | + |
| 107 | +[[sql-syntax-single-vs-double-quotes]] |
| 108 | +[float] |
| 109 | +=== Single vs Double Quotes |
| 110 | + |
| 111 | +It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably. |
| 112 | +Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>. |
| 113 | + |
| 114 | +To wit: |
| 115 | + |
| 116 | +[source, sql] |
| 117 | +---- |
| 118 | +SELECT "first_name" <1> |
| 119 | + FROM "musicians" <1> |
| 120 | + WHERE "last_name" <1> |
| 121 | + = 'Carroll' <2> |
| 122 | +---- |
| 123 | + |
| 124 | +<1> Double quotes `"` used for column and table identifiers |
| 125 | +<2> Single quotes `'` used for a string literal |
| 126 | + |
| 127 | +[[sql-syntax-special-chars]] |
| 128 | +[float] |
| 129 | +=== Special characters |
| 130 | + |
| 131 | +A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below: |
| 132 | + |
| 133 | + |
| 134 | +[cols="^m,^15"] |
| 135 | + |
| 136 | +|=== |
| 137 | + |
| 138 | +s|Char |
| 139 | +s|Description |
| 140 | + |
| 141 | +|* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions. |
| 142 | +|, | Commas are used to enumerate the elements of a list. |
| 143 | +|. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...). |
| 144 | +|()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence. |
| 145 | +|=== |
| 146 | + |
| 147 | +[[sql-syntax-operators]] |
| 148 | +[float] |
| 149 | +=== Operators |
| 150 | + |
| 151 | +Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence. |
| 152 | + |
| 153 | +The following table indicates the supported operators and their precendence (highest to lowest); |
| 154 | + |
| 155 | +[cols="^2m,^,^3"] |
| 156 | + |
| 157 | +|=== |
| 158 | + |
| 159 | +s|Operator/Element |
| 160 | +s|Associativity |
| 161 | +s|Description |
| 162 | + |
| 163 | +|. |
| 164 | +|left |
| 165 | +|qualifier separator |
| 166 | + |
| 167 | +|:: |
| 168 | +|left |
| 169 | +|PostgreSQL-style type cast |
| 170 | + |
| 171 | +|+ - |
| 172 | +|right |
| 173 | +|unary plus and minus (numeric literal sign) |
| 174 | + |
| 175 | +|* / % |
| 176 | +|left |
| 177 | +|multiplication, division, modulo |
| 178 | + |
| 179 | +|+ - |
| 180 | +|left |
| 181 | +|addition, substraction |
| 182 | + |
| 183 | +|BETWEEN IN LIKE |
| 184 | +| |
| 185 | +|range containment, string matching |
| 186 | + |
| 187 | +|< > <= >= = <=> <> != |
| 188 | +| |
| 189 | +|comparison |
| 190 | + |
| 191 | +|NOT |
| 192 | +|right |
| 193 | +|logical negation |
| 194 | + |
| 195 | +|AND |
| 196 | +|left |
| 197 | +|logical conjunction |
| 198 | + |
| 199 | +|OR |
| 200 | +|left |
| 201 | +|logical disjunction |
| 202 | + |
| 203 | +|=== |
| 204 | + |
| 205 | + |
| 206 | +[[sql-syntax-comments]] |
| 207 | +[float] |
| 208 | +=== Comments |
| 209 | + |
| 210 | +{es-sql} allows comments which are sequence of characters ignored by the parsers. |
| 211 | + |
| 212 | +Two styles are supported: |
| 213 | + |
| 214 | +Single Line:: Comments start with a double dash `--` and continue until the end of the line. |
| 215 | +Multi line:: Comments that start with `/\*` and end with `*/` (also known as C-style). |
| 216 | + |
| 217 | + |
| 218 | +[source, sql] |
| 219 | +---- |
| 220 | +-- single line comment |
| 221 | +/* multi |
| 222 | + line |
| 223 | + comment |
| 224 | + that supports /* nested comments */ |
| 225 | + */ |
| 226 | +---- |
| 227 | + |
0 commit comments