Some examples
Style Haypo 1 (based on Hachoir)
PngChunk ======== size:uint32:Size in bytes # Comment tag:string[4]:Content type content:bytes[value(size)] # Comment crc32:uint32: CRC32 checksum of content
- Each line is a field
- Syntax: "name:type:description"
- Spaces are allowed before/after ":"
- To write long line, start the next line with one or more spaces
- attribute are written "key=value"
text:string[10]:Text charset="UTF-8" text:string[10] Text charset="UTF-8"
- functions:
- value(x) or $x: value of field x
- address(x) or @x: relative address of field x
- size(x): size of field x
- size(): size of current field set
Code:
TestLoop ======== a:uint8 b:uint8 var(x)=size(a) var(max)=2*value(b) # or 2*$b do c[]: uint8 var(x) += value(last()) # last(): last added field while var(x) < var(max)
I don't like syntax var(x) :-p
Style pyconstruct
>>> # an adapter for MAC address (makes it readable) ... class MacAddressAdapter(Adapter): ... def _encode(self, obj): ... return "".join(obj.split("-")).decode("hex") ... def _decode(self, obj): ... return "-".join(b.encode("hex") for b in obj) ... >>> # a factory function... it can be thought of as a macro that "inlines" the ... # return value, instead of typing it over and over again ... def MacAddress(name): ... return MacAddressAdapter(Bytes(name, 6)) ... >>> # the full structure ... ethernet_header = Struct("ethernet_header", ... MacAddress("destination"), ... MacAddress("source"), ... Enum( ... UInt16("type"), ... mapping = dict( ... IP = 0x0800, ... ARP = 0x0806, ... X25 = 0x0805, ... IPX = 0x8137, ... IPv6 = 0x86DD, ... ) ... ), ... ) >>> >>> print ethernet_header.parse("ABCDEF123456\x08\x00") Container: destination = '41-42-43-44-45-46' source = '31-32-33-34-35-36' type = 'IP'
BMP syntax
Doom WAD parser:
ImpType Standard ; IDString 0 IWAD ; Get FC Long 0 ; SavePos TailOffOff 0 ; Get TO Long 0 ; GoTo TO 0 ; For TE = 1 To FC ; SavePos FOO 0 ; Get FO Long 0 ; SavePos FSO 0 ; Get FS Long 0 ; GetDString FN 8 0 ; Log FN FO FS FOO FSO ; Next TE ;
(Link to the parser in the wiki)
=> Binary MultiEx Commander Scripts (BMS language)
Features
Language have to be:
- plain text
- written in english and ASCII
- (TODO: finish the list :-))
Problem
Problems of Hachoir syntax:
- Hachoir parser are very close classic parser code: it's like a list of read(type, length)
- It's hard to convert an Hachoir parser to another syntax (C struct or any other parser syntax)
Requirements:
- need arithmetic operations: a+b, a-b, a*b, a/b (integer divison), a%b
- need (sometimes) to call Python function: alignValue() or many functions when creating a field description
- need to be able to get other field attribute (brother or parent)
More complex requirements:
- what about loop on range?
for index in range(n): ...
- what about loop with condition?
while self.current_size < self.size: ... while True: offset = UInt32(self, "offset[]") yield offset if offset.value == 0: break
- what about test (if)?
if self["has_comment"].value: yield CString(self, "comment")
Haypo: "On my mind, it's not possible or at least very complex"
Existing syntax for binary parsers
- DataWorkshop: syntax based on XML
- pyConstruct syntax (based on Python)
- Binary Format Description (BFD) Language, based on XML. It has some arithmetic and is able to call function
Can I suggest to look around http://www.ccsds.org/ as binary data is fundamental in space applications. In particular, we can find the DEDSL format. A (proprietary) tool is normaly able to handle that: http://debat.c-s.fr/