Skip to content

Commit 7f62455

Browse files
committed
Clarify that lexing is greedy
GraphQL syntactical grammars intend to be unambiguous. While lexical grammars should also be - there has historically been an assumption that lexical parsing is greedy. This is obvious for numbers and words, but less obvious for empty block strings. Either way, the additional clarity removes ambiguity from the spec Partial fix for #564 Specifically addresses #564 (comment)
1 parent dfd7571 commit 7f62455

3 files changed

+39
-10
lines changed

spec/Appendix A -- Notation Conventions.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,12 @@ ListOfLetterA :
4848

4949
The GraphQL language is defined in a syntactic grammar where terminal symbols
5050
are tokens. Tokens are defined in a lexical grammar which matches patterns of
51-
source characters. The result of parsing a sequence of source Unicode characters
52-
produces a GraphQL AST.
51+
source characters. The result of parsing a source text sequence of Unicode
52+
characters first produces a sequence of lexical tokens according to the lexical
53+
grammar which then produces abstract syntax tree (AST) according to the
54+
syntactical grammar.
5355

54-
A Lexical grammar production describes non-terminal "tokens" by
56+
A lexical grammar production describes non-terminal "tokens" by
5557
patterns of terminal Unicode characters. No "whitespace" or other ignored
5658
characters may appear between any terminal Unicode characters in the lexical
5759
grammar production. A lexical grammar production is distinguished by a two colon

spec/Appendix B -- Grammar Summary.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# B. Appendix: Grammar Summary
22

3+
The source text of a GraphQL document must be a sequence of {SourceCharacter}.
4+
The character sequence must be described by a sequence of {Token} and {Ignored}
5+
lexical grammars. The lexical token sequence, omitting {Ignored}, must be
6+
described by a single {Document} syntactic grammar.
7+
8+
9+
## Source Text
10+
311
SourceCharacter :: /[\u0009\u000A\u000D\u0020-\uFFFF]/
412

513

@@ -89,7 +97,7 @@ Note: Block string values are interpreted to exclude blank initial and trailing
8997
lines and uniform indentation with {BlockStringValue()}.
9098

9199

92-
## Document
100+
## Document Syntax
93101

94102
Document : Definition+
95103

spec/Section 2 -- Language.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,18 @@ common unit of composition allowing for query reuse.
77

88
A GraphQL document is defined as a syntactic grammar where terminal symbols are
99
tokens (indivisible lexical units). These tokens are defined in a lexical
10-
grammar which matches patterns of source characters (defined by a
11-
double-colon `::`).
10+
grammar which matches patterns of source characters. In this document, syntactic
11+
grammar productions are distinguished with a colon `:` while lexical grammar
12+
productions are distinguished with a double-colon `::`.
1213

13-
Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more details about the definition of lexical and syntactic grammar and other notational conventions
14-
used in this document.
14+
The source text of a GraphQL document must be a sequence of {SourceCharacter}.
15+
The character sequence must be described by a sequence of {Token} and {Ignored}
16+
lexical grammars. The lexical token sequence, omitting {Ignored}, must be
17+
described by a single {Document} syntactic grammar.
18+
19+
Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more information
20+
about the lexical and syntactic grammar and other notational conventions used
21+
throughout this document.
1522

1623

1724
## Source Text
@@ -25,6 +32,19 @@ ASCII range so as to be as widely compatible with as many existing tools,
2532
languages, and serialization formats as possible and avoid display issues in
2633
text editors and source control.
2734

35+
**Greedy Lexical Parsing**
36+
37+
The source text of a GraphQL document is first converted into a sequence of
38+
lexical tokens, {Token}, and ignored tokens, {Ignored}. The source text is
39+
scanned from left to right, repeatedly taking the longest possible sequence of
40+
unicode characters as the next token.
41+
42+
For example, the sequence `123` is always interpreted as a single {IntValue},
43+
and `""""""` is always interpreted as a single block {StringValue}.
44+
45+
This sequence of lexical tokens are then scanned from left to right to produce
46+
an abstract syntax tree (AST) according to the {Document} syntactical grammar.
47+
2848

2949
### Unicode
3050

@@ -118,8 +138,7 @@ Token ::
118138
A GraphQL document is comprised of several kinds of indivisible lexical tokens
119139
defined here in a lexical grammar by patterns of source Unicode characters.
120140

121-
Tokens are later used as terminal symbols in a GraphQL Document
122-
syntactic grammars.
141+
Tokens are later used as terminal symbols in GraphQL syntactic grammar rules.
123142

124143

125144
### Ignored Tokens

0 commit comments

Comments
 (0)