Skip to content

Commit b692dee

Browse files
committed
spec: describe components of EBNF grammar
To clarify the grammar definitions, we define the subset of EBNF used by this specification to specify various field formats. Signed-off-by: Stephen J Day <[email protected]>
1 parent 6772079 commit b692dee

File tree

3 files changed

+115
-11
lines changed

3 files changed

+115
-11
lines changed

annotations.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ This specification defines the following annotation keys, intended for but not l
3131
* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
3232
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
3333
* Character set of the value SHOULD conform to alphanum of `A-Za-z0-9` and separator set of `-._:@/+`
34-
* An EBNF'esque grammar + regular expression like:
34+
* The reference must match the following [grammar](considerations.md#ebnf):
3535
```
36-
ref := component ["/" component]*
37-
component := alphanum [separator alphanum]*
38-
alphanum := /[A-Za-z0-9]+/
39-
separator := /[-._:@+]/ | "--"
36+
ref ::= component ("/" component)*
37+
component ::= alphanum (separator alphanum)*
38+
alphanum ::= [A-Za-z0-9]+
39+
separator ::= [-._:@+] | "--"
4040
```
4141
* **org.opencontainers.image.title** Human-readable title of the image (string)
4242
* **org.opencontainers.image.description** Human-readable description of the software packaged in the image (string)

considerations.md

+104
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,107 @@ Implementations:
2424
[github.com/docker/go]: https://github.com/docker/go/
2525
[Go]: https://golang.org/
2626
[JSON]: http://json.org/
27+
28+
# EBNF
29+
30+
For field formats described in this specification, we use a limited subset of [Extended Backus-Naur Form][ebnf], similar to that used by the [XML specification][xmlebnf].
31+
Grammars present in the OCI specification are regular and can be converted to a single regular expressions.
32+
However, regular expressions are avoided to limit abiguity between regular expression syntax.
33+
By defining a subset of EBNF used here, the possibility of variation, misunderstanding or ambiguities from linking to a larger specification can be avoided.
34+
35+
Grammars are made up of rules in the following form:
36+
37+
```
38+
symbol ::= expression
39+
```
40+
41+
We can say we have the production identified by symbol if the input is matched by the expression.
42+
Whitespace is completely ignored in rule definitions.
43+
44+
## Expressions
45+
46+
The simplest expression is the literal, surrounded by quotes:
47+
48+
```
49+
literal ::= "matchthis"
50+
```
51+
52+
The above expression defines a symbol, "literal", that matches the exact input of "matchthis".
53+
Character classes are delineated by brackets (`[]`), describing either a set, range or multiple range of characters:
54+
55+
```
56+
set := [abc]
57+
range := [A-Z]
58+
```
59+
60+
The above symbol "set" would match one character of either "a", "b" or "c".
61+
The symbol "range" would match any character, "A" to "Z", inclusive.
62+
Currently, only matching for 7-bit ascii literals and character classes is defined, as that is all that is required by this specification.
63+
64+
Expressions can be made up of one or more expressions, such that one must be followed by the other.
65+
This is known as an implicit concatenation operator.
66+
For example, to satisfy the following rule, both `A` and `B` must be matched to satisfy the rule:
67+
68+
```
69+
symbol ::= A B
70+
```
71+
72+
Each expression must be matched once and only once, `A` followed by `B`.
73+
To support the description of repetition and optional match criteria, the postfix operators `*` and `+` are defined.
74+
`*` indicates that the preceeding expression can be matched zero or more times.
75+
`+` indicates that the preceeding expression must be matched one or more times.
76+
These appear in the following form:
77+
78+
```
79+
zeroormore ::= expression*
80+
oneormore ::= expression+
81+
```
82+
83+
Parentheses are used to group expressions into a larger expression:
84+
85+
```
86+
group ::= (A B)
87+
```
88+
89+
Like simpler expressions above, operators can be applied to groups, as well.
90+
To allow for alternates, we also define the infix operator `|`.
91+
92+
```
93+
oneof ::= A | B
94+
```
95+
96+
The above indicates that the expression should match one of the expressions, `A` or `B`.
97+
98+
## Precedence
99+
100+
The operator precedence is in the following order:
101+
102+
- Terminals (literals and character classes)
103+
- Grouping `()`
104+
- Unary operators `+*`
105+
- Concatenation
106+
- Alternates `|`
107+
108+
The precedence can be better described using grouping to show equivalents.
109+
Concatenation has higher precedence than alernates, such `A B | C D` is equivalent to `(A B) | (C D)`.
110+
Unary operators have higher precedence than alternates and concatenation, such that `A+ | B+` is equivalent to `(A+) | (B+)`.
111+
112+
## Examples
113+
114+
The following combines the previous definitions to match a simple, relative path name, describing the individual components:
115+
116+
```
117+
path ::= component ("/" component)*
118+
component ::= [a-z]+
119+
```
120+
121+
The production "component" is one or more lowercase letters.
122+
A "path" is then at least one component, possibly followed by zero or more slash-component pairs.
123+
The above can be converted into the following regular expression:
124+
125+
```
126+
[a-z]+(?:/[a-z]+)*
127+
```
128+
129+
[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
130+
[xmlebnf]: https://www.w3.org/TR/REC-xml/#sec-notation

descriptor.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,14 @@ If the _digest_ can be communicated in a secure manner, one can verify content f
6666
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
6767
The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.
6868

69-
A digest string MUST match the following grammar:
69+
A digest string MUST match the following [grammar](considerations.md#ebnf):
7070

7171
```
72-
digest := algorithm ":" encoded
73-
algorithm := algorithm-component [algorithm-separator algorithm-component]*
74-
algorithm-component := /[a-z0-9]+/
75-
algorithm-separator := /[+._-]/
76-
encoded := /[a-zA-Z0-9=_-]+/
72+
digest ::= algorithm ":" encoded
73+
algorithm ::= algorithm-component (algorithm-separator algorithm-component)*
74+
algorithm-component ::= [a-z0-9]+
75+
algorithm-separator ::= [+._-]
76+
encoded ::= [a-zA-Z0-9=_-]+
7777
```
7878

7979
Note that _algorithm_ MAY impose algorithm-specific restriction on the grammar of the _encoded_ portion.

0 commit comments

Comments
 (0)