Commit 4d2e1d75 authored by Per Bothner's avatar Per Bothner

New proposal for options syntax

parent 5e9185a5
* Name: delimiting-options
* Start Date: 2019-12-12
# Summary
This is not a specification for a concrete feature or escape sequence syntax.
Rather, it is a set of recommendations and conventions for
designing escape sequences with non-simple syntax and options.
Various existing and proposed escape sequences take a variable number
of options, typically using keywords and separated by semicolons.
Some option values may themselves contain semicolons, which
leads to the issue of how to "quote" semicolons.
The traditional mechanism is to use backslash or quotes.
We define an extensible convention for
quoting and escaping option values (using C-style backslashes).
We define a parsing algorithm to split a string containing a
sequence of semicolon delimited options, with possible escape
characters, into a list of options (with or without keywords).
The algorithm is independent of the set of allowable C-style escapes,
as long as a bare minimum is defined: The parser preserves escape
sequences so they can be interpreted by subsequent processing steps.
(Note: Some have suggested base64-encoding any value
that may contain "special" characters, including semicolons.
However, base64 is meant to encode binary data, and is not
designed to encode strings: It is over-kill, inefficient,
and makes the escape sequences harder to write, parse, and debug.)
# Options syntax
An escape sequence (such an Operating System Command)
may contain an identifying key followed by a sequence of _options_.
> _options_ ::= _nothing_ | _option_ (`;` _option_)*
> _option_ ::= _unnamed-option_ | _named-option_
> _unnamed-option_ ::= _value_
> _named-option_ ::= _name_ `=` _value_
> _name_ ::= (_other_ | _escape_)+
> _value_ ::= (_other_ | _escape_ | _dquoted_ | _squoted_)*
> _other_ ::= any non-control character except `'`, `"`, `\`, `;`, or `=`
> _escape_ := `\` any non-control character
> _dquoted_ ::= `"` (_other_ | _escape_ | `;` | `=` | `'`)* `"`
> _squoted_ ::= `'` (_other_ | _escape_ | `;` | `=` | `"`)* `'`
# Parsing algorithm
The `scan` method below (written in Java syntax) takes an _options_
string (the value of `str.substring(start,limit)`) and splits it into
indivdual options, calling `processNamedOption(name,value)` for each
_named-option_ and calling `processPositionalOption(value)` for each
_unnamed-option_. The `name` and `value` arguments include any
backslash and string quotes from the original.
public int scan(String str, int start, int limit) {
int eq = -1; // index of '='
char qmode = 0; // either '\'' or '\"' or 0
int ostart = start;
boolean slash_seen = false; // if immediately after a '\\'
for (int i = start; ; i++) {
boolean done = i >= limit;
char ch = done ? 0 : str.charAt(i);
if (done || (ch == ';' && ! slash_seen && qmode == 0)) {
if (eq >= 0)
processNamedOption(str.substring(ostart, eq),
str.substring(eq+1, i));
else
processPositionalOption(str.substring(ostart, i));
if (done)
return i;
eq = -1;
ostart = i + 1;
}
if (slash_seen)
slash_seen = false;
else if (ch == '\\')
slash_seen = true;
else if (ch == '=' && eq < 0)
eq = i;
else if (ch == '\'' || ch == '\"') {
if (ch == qmode)
qmode = 0;
else if (qmode == 0)
qmode = ch;
}
}
}
# Suggestions for value encodings (non-normative)
This specification does not place any restrictions or interpretation
on _value_. It is suggested that binary data (such as images) be
encoded using base64.
Numbers (as well as numbers with units such as `80px`) should be
written with no spaces, escapes, or quotes.
More general strings (such as filenames, URLs, or key sequences)
should usually be a quoted string (_squoted_ or _dquoted_).
More complex data (lists or "objects") can use JSON.
# Basic string values (non-normative)
For an _option_ where the _value_ can be a general string,
one possibility is to restrict the syntax to that of JSON strings.
This syntax is easy to process by hand, or you can use a JSON library.
That means the _value_ delimited by double quotes (a _dquoted_),
with the following escapes: `\\`, `\"`, `\b`, `\f`, `\n`, `\r`, `\t`
and `\uXXXX` (4 hex digits).
(JSON also allows `\/`, but it is less portable and seems useless.)
If a character requires more than 16 bits, the simplest
approach is to include the character literally, since
you would be unlikely to need such a character unless
your environment supports Unicode. (This is also JSON-compatible.)
Using escape sequences is not well standardized: reasonable choices
are `\uAAAA\uBBBB` (JSON/JavaScript/Java, using UTF-16 surrogate characters);
`\u{XXXXXX}` (JavaScript); or `\UXXXXXXXX` (C/C++/Python/Fish shell).
The JavaScript `\u{XXXXXX}` syntax seems most readable (for humans, at least).
If going beyond the set of standard JSON escape sequences,
you might consider allowing `\e` for the escape character.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment