You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify lexing is greedy with lookahead restrictions.
GraphQL syntactical grammars intend to be unambiguous. While lexical grammars should also be - there has historically been an assumption that lexical parsing is greedy. This is obvious for numbers and words, but less obvious for empty block strings.
This also removes regular expression representation from the lexical grammar notation, since it wasn't always clear.
Either way, the additional clarity removes ambiguity from the spec
Partial fix for #564
Specifically addresses #564 (comment)
Copy file name to clipboardExpand all lines: spec/Appendix A -- Notation Conventions.md
+30-16Lines changed: 30 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,10 @@ of the sequences it is defined by, until all non-terminal symbols have been
22
22
replaced by terminal characters.
23
23
24
24
Terminals are represented in this document in a monospace font in two forms: a
25
-
specific Unicode character or sequence of Unicode characters (ex. {`=`} or {`terminal`}), and a pattern of Unicode characters defined by a regular expression
26
-
(ex {/[0-9]+/}).
25
+
specific Unicode character or sequence of Unicode characters (ie. {`=`} or
26
+
{`terminal`}), and prose typically describing a specific Unicode code-point
27
+
{"Space (U+0020)"}. Sequences of Unicode characters only appear in syntactic
28
+
grammars and represent a {Name} token of that specific sequence.
27
29
28
30
Non-terminal production rules are represented in this document using the
29
31
following notation for a non-terminal with a single definition:
@@ -48,23 +50,25 @@ ListOfLetterA :
48
50
49
51
The GraphQL language is defined in a syntactic grammar where terminal symbols
50
52
are tokens. Tokens are defined in a lexical grammar which matches patterns of
51
-
source characters. The result of parsing a sequence of source Unicode characters
52
-
produces a GraphQL AST.
53
+
source characters. The result of parsing a source text sequence of Unicode
54
+
characters first produces a sequence of lexical tokens according to the lexical
55
+
grammar which then produces abstract syntax tree (AST) according to the
56
+
syntactical grammar.
53
57
54
-
A Lexical grammar production describes non-terminal "tokens" by
58
+
A lexical grammar production describes non-terminal "tokens" by
55
59
patterns of terminal Unicode characters. No "whitespace" or other ignored
56
60
characters may appear between any terminal Unicode characters in the lexical
57
61
grammar production. A lexical grammar production is distinguished by a two colon
58
62
`::` definition.
59
63
60
-
Word :: /[A-Za-z]+/
64
+
Word :: Letter+
61
65
62
66
A Syntactical grammar production describes non-terminal "rules" by patterns of
63
-
terminal Tokens. Whitespace and other ignored characters may appear before or
64
-
after any terminal Token. A syntactical grammar production is distinguished by a
65
-
one colon `:` definition.
67
+
terminal Tokens. {WhiteSpace} and other {Ignored} sequences may appear before or
68
+
after any terminal {Token}. A syntactical grammar production is distinguished by
69
+
a one colon `:` definition.
66
70
67
-
Sentence : Noun Verb
71
+
Sentence : Word+ `.`
68
72
69
73
70
74
## Grammar Notation
@@ -80,13 +84,11 @@ and their expanded definitions in the context-free grammar.
80
84
A grammar production may specify that certain expansions are not permitted by
81
85
using the phrase "but not" and then indicating the expansions to be excluded.
82
86
83
-
For example, the production:
87
+
For example, the following production means that the nonterminal {SafeWord} may
88
+
be replaced by any sequence of characters that could replace {Word} provided
89
+
that the same sequence of characters could not replace {SevenCarlinWords}.
84
90
85
-
SafeName : Name but not SevenCarlinWords
86
-
87
-
means that the nonterminal {SafeName} may be replaced by any sequence of
88
-
characters that could replace {Name} provided that the same sequence of
89
-
characters could not replace {SevenCarlinWords}.
91
+
SafeWord : Word but not SevenCarlinWords
90
92
91
93
A grammar may also list a number of restrictions after "but not" separated
92
94
by "or".
@@ -96,6 +98,18 @@ For example:
96
98
NonBooleanName : Name but not `true` or `false`
97
99
98
100
101
+
**Lookahead Restrictions**
102
+
103
+
A grammar production may specify that certain characters or tokens are not
104
+
permitted to follow it by using the pattern {[lookahead != NotAllowed]}.
105
+
Lookahead restrictions are often used to remove ambiguity from the grammar.
106
+
107
+
The following example makes it clear that {Letter+} must be greedy, since {Word}
108
+
cannot be followed by yet another {Letter}.
109
+
110
+
Word :: Letter+ [lookahead != Letter]
111
+
112
+
99
113
**Optionality and Lists**
100
114
101
115
A subscript suffix "{Symbol?}" is shorthand for two possible sequences, one
0 commit comments