Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: describe components of EBNF grammar #695

Merged
merged 2 commits into from
Jun 23, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ This specification defines the following annotation keys, intended for but not l
* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
* Character set of the value SHOULD conform to alphanum of `A-Za-z0-9` and separator set of `-._:@/+`
* An EBNF'esque grammar + regular expression like:
* The reference must match the following [grammar](considerations.md#ebnf):
```
ref := component ["/" component]*
component := alphanum [separator alphanum]*
alphanum := /[A-Za-z0-9]+/
separator := /[-._:@+]/ | "--"
ref ::= component ("/" component)*
component ::= alphanum (separator alphanum)*
alphanum ::= [A-Za-z0-9]+
separator ::= [-._:@+] | "--"
```
* **org.opencontainers.image.title** Human-readable title of the image (string)
* **org.opencontainers.image.description** Human-readable description of the software packaged in the image (string)
Expand Down
111 changes: 111 additions & 0 deletions considerations.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,114 @@ Implementations:
[github.com/docker/go]: https://github.com/docker/go/
[Go]: https://golang.org/
[JSON]: http://json.org/

# EBNF

For field formats described in this specification, we use a limited subset of [Extended Backus-Naur Form][ebnf], similar to that used by the [XML specification][xmlebnf].
Grammars present in the OCI specification are regular and can be converted to a single regular expressions.
However, regular expressions are avoided to limit abiguity between regular expression syntax.
By defining a subset of EBNF used here, the possibility of variation, misunderstanding or ambiguities from linking to a larger specification can be avoided.

Grammars are made up of rules in the following form:

```
symbol ::= expression
```

We can say we have the production identified by symbol if the input is matched by the expression.
Whitespace is completely ignored in rule definitions.

## Expressions

The simplest expression is the literal, surrounded by quotes:

```
literal ::= "matchthis"
```

The above expression defines a symbol, "literal", that matches the exact input of "matchthis".
Character classes are delineated by brackets (`[]`), describing either a set, range or multiple range of characters:

```
set := [abc]
range := [A-Z]
```

The above symbol "set" would match one character of either "a", "b" or "c".
The symbol "range" would match any character, "A" to "Z", inclusive.
Currently, only matching for 7-bit ascii literals and character classes is defined, as that is all that is required by this specification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should it be clarified that multiple ranges can be in the same brackets, along with a set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from @wking "multiple ranges and explicit characters should be allowed in a single range. The XML-spec wording for that is "Enumerations and ranges can be mixed in one set of brackets." https://www.w3.org/TR/REC-xml/#sec-notation"

Multiple character ranges and explicit characters can be specified in a single character classes, as follows:

```
multipleranges := [a-zA-Z=-]
```

The above matches the characters in the range `A` to `Z`, `a` to `z` and the individual characters `-` and `=`.

Expressions can be made up of one or more expressions, such that one must be followed by the other.
This is known as an implicit concatenation operator.
For example, to satisfy the following rule, both `A` and `B` must be matched to satisfy the rule:

```
symbol ::= A B
```

Each expression must be matched once and only once, `A` followed by `B`.
To support the description of repetition and optional match criteria, the postfix operators `*` and `+` are defined.
`*` indicates that the preceeding expression can be matched zero or more times.
`+` indicates that the preceeding expression must be matched one or more times.
These appear in the following form:

```
zeroormore ::= expression*
oneormore ::= expression+
```

Parentheses are used to group expressions into a larger expression:

```
group ::= (A B)
```

Like simpler expressions above, operators can be applied to groups, as well.
To allow for alternates, we also define the infix operator `|`.

```
oneof ::= A | B
```

The above indicates that the expression should match one of the expressions, `A` or `B`.

## Precedence

The operator precedence is in the following order:

- Terminals (literals and character classes)
- Grouping `()`
- Unary operators `+*`
- Concatenation
- Alternates `|`

The precedence can be better described using grouping to show equivalents.
Concatenation has higher precedence than alernates, such `A B | C D` is equivalent to `(A B) | (C D)`.
Unary operators have higher precedence than alternates and concatenation, such that `A+ | B+` is equivalent to `(A+) | (B+)`.

## Examples

The following combines the previous definitions to match a simple, relative path name, describing the individual components:

```
path ::= component ("/" component)*
component ::= [a-z]+
```

The production "component" is one or more lowercase letters.
A "path" is then at least one component, possibly followed by zero or more slash-component pairs.
The above can be converted into the following regular expression:

```
[a-z]+(?:/[a-z]+)*
```

[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
[xmlebnf]: https://www.w3.org/TR/REC-xml/#sec-notation
12 changes: 6 additions & 6 deletions descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,14 @@ If the _digest_ can be communicated in a secure manner, one can verify content f
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.

A digest string MUST match the following grammar:
A digest string MUST match the following [grammar](considerations.md#ebnf):

```
digest := algorithm ":" encoded
algorithm := algorithm-component [algorithm-separator algorithm-component]*
algorithm-component := /[a-z0-9]+/
algorithm-separator := /[+._-]/
encoded := /[a-zA-Z0-9=_-]+/
digest ::= algorithm ":" encoded
algorithm ::= algorithm-component (algorithm-separator algorithm-component)*
algorithm-component ::= [a-z0-9]+
algorithm-separator ::= [+._-]
encoded ::= [a-zA-Z0-9=_-]+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line would seem to get broken into something like [a-z]* | [A-Z]* | [0-9]* | [=_-]+...

```

Note that _algorithm_ MAY impose algorithm-specific restriction on the grammar of the _encoded_ portion.
Expand Down