summaryrefslogtreecommitdiff
path: root/README
blob: 6a8061a193a9cda1351ee65d2988d074a795b367 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
A serialization format draft. Its grammar:

delimiter = " " | "\n"
restricted-char = "(" | ")" | delimiter | "\"
tree-or-val = "(" forest ")"
            | delimiter +
            | ("\" restricted-char | any-char - restricted-char) +
forest = tree-or-val *

A document (possibly a stream) is a list (forest) of trees and
individual values (literals). Parentheses are the primary separators,
allowing to shape the forest, and whitespace characters allow for
additional tokenization, into "words".

Serialized representations are isomorphic to the underlying data
model, repeated alternating parsing and printing must produce original
values. So there are preserved whitespace character runs, the escaping
rules providing exactly one way to represent any given character
within literals, no support for hex digit or mnemonic escapes (which
can be applied externally if needed).

The optional document declaration is a single parenthesized forest,
including a constant identifier, format version, and an encoding:

(wt 0.0 UTF-8)

"Words" are not typed, since with formats like JSON, the parsing of
those often happens out of string literals, in addition to parsing of
the structure and standardized types, and distinctions between
standardized types tend to stay in the way, rather than to help (e.g.,
when reading structures serialized without much care from a
dynamically typed language).

Likewise, there are no quoted strings: a text run (possibly
parenthesized) can be treated as a string, similarly to how it is done
in SGML-based languages. Either a parser may provide those by parsing
lazily, or they can be reconstructed out of individual tree leaves.

Planning use of association lists, first symbol in a subforest playing
the role of a tag or a constructor (as with S-expressions), name
prefixes to emulate namespaces.

Aiming simplicity of implementation and of usage (reading and
writing). Multiple implementations are included to use as references,
for testing, and as a general playground.

Since there are no types, a formal schema description (apart from
comments) should focus on parsing alone, but since the grammar is
quite simple, anything generic and capable of a context-free grammar
definition, like ABNF, may be appropriate to use as a schema, as long
as the defined grammars are compatible with the core format. Rather
turning it into a skeleton of a grammar.