Ticket #133 (new defect)

Opened 1 year ago

Last modified 1 year ago

Permit usage of substreams as subfields

Reported by: nneonneo Assigned to: haypo
Priority: normal Milestone:
Component: core Keywords:
Cc:

Description

While editing the OLE2 [MSOffice] parser, I noticed that it was possible, through rather arcane hacks, to sew together unattached fragments (by way of the fragment_group field which is created as the properties are explored). The fragments are sewn together by FragmentGroup?, presented as a substream and subsequently left unused.

At the end of createFields in OLE2, I added this code:

        if "root[0]" in self:
            self.seekBit(0)
            stream=self["root[0]"].group.createInputStream()
            psfield=OfficeRootEntry(stream)
            RootSeekableFieldSet.__init__(psfield,self,"root",stream,"Document Fragment Group: root",stream.size)
            psfield.ole2=self
            yield psfield

Yes, there is voodoo. Basically, if any root entries were found (summary and doc_summary entries follow), it seeks to the beginning (to allow for enough perceived space to store the data) and then creates a stream from the fragment_group (which happens to be a StringInputStream?), initializes a parser for it, and then attaches the parser to the main document. Now, with an unmodified base library, this fails with an AssertionError?: the parent stream must match the child stream. If this line is commented out, the stream functions perfectly, allowing inspection of the contained elements. The OfficeRootEntry? parser isn't registered with the main parser list, as it doesn't function on its own (it requires variables and methods from the parent OLE2 parser, which is linked by the above code). Thus, the substream "technique" appears to be the only truly viable way of reading the data (the parser can't be attached to the stream directly because it is heavily fragmented in my test file, with over 12 separate fragments; and the function seekSBlock assumes a contiguous stream).

I therefore suggest removing the assertion for

assert id(self.stream) == id(parent.stream)

in hachoir_core.field.basic_field_set.py.

Attachments

Change History

06/25/07 12:41:01 changed by anonymous

This idea brokes an Hachoir concept: Hachoir maps file binary data to fields and this conversion is bijective. With your proposition, it's possible to have two field at the same address and that's insane.

Use hachoir-urwid and press space key to open subfield. Or patch hachoir-wx to have same feature.

06/27/07 05:20:05 changed by nneonneo <nneonneo@gmail.com>

Aaah, I never noticed this. I must be blind...

Anyway, thank you for clearing this up. I will certainly be going back and removing the crappy OLE2 changes I made :)


Add/Change #133 (Permit usage of substreams as subfields)