Some examples

Style Haypo 1 (based on Hachoir)

PngChunk
========

size:uint32:Size in bytes
# Comment
tag:string[4]:Content type
content:bytes[value(size)]  # Comment
crc32:uint32: CRC32 checksum
  of content
  • Each line is a field
  • Syntax: "name:type:description"
  • Spaces are allowed before/after ":"
  • To write long line, start the next line with one or more spaces
  • attribute are written "key=value"
    text:string[10]:Text charset="UTF-8"
    text:string[10]
      Text
      charset="UTF-8"
    
  • functions:
    • value(x) or $x: value of field x
    • address(x) or @x: relative address of field x
    • size(x): size of field x
    • size(): size of current field set

Code:

TestLoop
========

a:uint8
b:uint8
var(x)=size(a)
var(max)=2*value(b)  # or 2*$b
do
   c[]: uint8
   var(x) += value(last())  # last(): last added field
while var(x) < var(max)

I don't like syntax var(x) :-p

Style pyconstruct

>>> # an adapter for MAC address (makes it readable)
... class MacAddressAdapter(Adapter):
...     def _encode(self, obj):
...         return "".join(obj.split("-")).decode("hex")
...     def _decode(self, obj):
...         return "-".join(b.encode("hex") for b in obj)
...
>>> # a factory function... it can be thought of as a macro that "inlines" the
... # return value, instead of typing it over and over again
... def MacAddress(name):
...     return MacAddressAdapter(Bytes(name, 6))
...
>>> # the full structure
... ethernet_header = Struct("ethernet_header",
...     MacAddress("destination"),
...     MacAddress("source"),
...     Enum(
...         UInt16("type"),
...         mapping = dict(
...             IP = 0x0800,
...             ARP = 0x0806,
...             X25 = 0x0805,
...             IPX = 0x8137,
...             IPv6 = 0x86DD,
...         )
...     ),
... )
>>>
>>> print ethernet_header.parse("ABCDEF123456\x08\x00")
Container:
    destination = '41-42-43-44-45-46'
    source = '31-32-33-34-35-36'
    type = 'IP'

(more examples)

BMP syntax

Doom WAD parser:

ImpType Standard ;
IDString 0 IWAD ;
Get FC Long 0 ;
SavePos TailOffOff 0 ;
Get TO Long 0 ;
GoTo TO 0 ;
For TE = 1 To FC ;
SavePos FOO 0 ;
Get FO Long 0 ;
SavePos FSO 0 ;
Get FS Long 0 ;
GetDString FN 8 0 ;
Log FN FO FS FOO FSO ;
Next TE ;

(Link to the parser in the wiki)

=> Binary MultiEx Commander Scripts (BMS language)

Features

Language have to be:

  • plain text
  • written in english and ASCII
  • (TODO: finish the list :-))

Problem

Problems of Hachoir syntax:

  • Hachoir parser are very close classic parser code: it's like a list of read(type, length)
  • It's hard to convert an Hachoir parser to another syntax (C struct or any other parser syntax)

Requirements:

  • need arithmetic operations: a+b, a-b, a*b, a/b (integer divison), a%b
  • need (sometimes) to call Python function: alignValue() or many functions when creating a field description
  • need to be able to get other field attribute (brother or parent)

More complex requirements:

  • what about loop on range?
    for index in range(n):
       ...
    
  • what about loop with condition?
    while self.current_size < self.size:
       ...
    while True:
      offset = UInt32(self, "offset[]")
      yield offset
      if offset.value == 0:
         break
    
  • what about test (if)?
    if self["has_comment"].value:
       yield CString(self, "comment")
    

Haypo: "On my mind, it's not possible or at least very complex"

Existing syntax for binary parsers

Can I suggest to look around http://www.ccsds.org/ as binary data is fundamental in space applications. In particular, we can find the DEDSL format. A (proprietary) tool is normaly able to handle that: http://debat.c-s.fr/