Skip to content

Commit b1589a5

Browse files
committed
Clarify that lexing is greedy
GraphQL syntactical grammars intend to be unambiguous. While lexical grammars should also be - there has historically been an assumption that lexical parsing is greedy. This is obvious for numbers and words, but less obvious for empty block strings. Either way, the additional clarity removes ambiguity from the spec Partial fix for #564 Specifically addresses #564 (comment)
1 parent dfd7571 commit b1589a5

File tree

2 files changed

+18
-7
lines changed

2 files changed

+18
-7
lines changed

spec/Appendix A -- Notation Conventions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ ListOfLetterA :
4949
The GraphQL language is defined in a syntactic grammar where terminal symbols
5050
are tokens. Tokens are defined in a lexical grammar which matches patterns of
5151
source characters. The result of parsing a sequence of source Unicode characters
52-
produces a GraphQL AST.
52+
produces a GraphQL abstract syntax tree (AST).
5353

5454
A Lexical grammar production describes non-terminal "tokens" by
5555
patterns of terminal Unicode characters. No "whitespace" or other ignored

spec/Section 2 -- Language.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@ common unit of composition allowing for query reuse.
77

88
A GraphQL document is defined as a syntactic grammar where terminal symbols are
99
tokens (indivisible lexical units). These tokens are defined in a lexical
10-
grammar which matches patterns of source characters (defined by a
11-
double-colon `::`).
10+
grammar which matches patterns of source characters. In this document, syntactic
11+
grammar rules are represented with a colon `:` while lexical grammar rules are
12+
represented with a double-colon `::`.
1213

13-
Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more details about the definition of lexical and syntactic grammar and other notational conventions
14-
used in this document.
14+
Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more information
15+
about the lexical and syntactic grammar and other notational conventions used
16+
throughout this document.
1517

1618

1719
## Source Text
@@ -25,6 +27,16 @@ ASCII range so as to be as widely compatible with as many existing tools,
2527
languages, and serialization formats as possible and avoid display issues in
2628
text editors and source control.
2729

30+
**Greedy Lexical Parsing**
31+
32+
The source text of a GraphQL document is first converted into a sequence lexical
33+
tokens and ignored tokens (omitting ignored tokens). The source text is scanned
34+
from left to right, repeatedly taking the longest possible sequence of unicode
35+
characters as the next token.
36+
37+
For example, the sequence `123` is always interpreted as a single {IntValue},
38+
and `""""""` is always interpreted as a single block {StringValue}.
39+
2840

2941
### Unicode
3042

@@ -118,8 +130,7 @@ Token ::
118130
A GraphQL document is comprised of several kinds of indivisible lexical tokens
119131
defined here in a lexical grammar by patterns of source Unicode characters.
120132

121-
Tokens are later used as terminal symbols in a GraphQL Document
122-
syntactic grammars.
133+
Tokens are later used as terminal symbols in GraphQL syntactic grammar rules.
123134

124135

125136
### Ignored Tokens

0 commit comments

Comments
 (0)