News between 0.5 and 0.4
User side:
- hachoir script splitted in many scripts:
- hachoir-urwid: curses interface using python-urwid framework
- hachoir-metadata: (see below)
- hachoir-grep (new): find a string in a file or just list all strings
- Metadata:
- Accept multiple filenames
- Option --mime just display MIME type
- Add Matroska metadata extractor and improve RIFF extractor (WAV, AVI and CDA)
- New parsers: ASF (WMV video and WMA audio), MOV (Quicktime movie), CDA (Windows audio file), FAT (FAT12/16/32 filesystem), TIFF image, ELF program
- Urwid frontend:
- New options --preload and --path
- For any embedded content (a file in an archive or a file system, a stream in a multimedia container), you can open it a new window: there are 2 command for that ('space' and 'f') and that tells hachoir to search for a parser and to parse further. For fragmented contents, it requires that the container is designed to reconstruct the stream: only FAT is supported for the moment.
- New concept of internal links, or delayed parsing: A field can reference a specific location in the file and links make navigation easier. For several formats like file systems in general, following a link is the only way to trigger the parsing of an area. Anyway, for several formats, this is the only reasonable way to handle files of any size.
Developer side:
- String are now stored in Unicode, and support UTF-16LE and UTF-16BE charsets
- Fix display attribute for string (truncate text in characters, not in bytes)
- New types: StaticFieldSet (more compact syntax), Bytes, Link (internal links: link a field to another)
- Creation of a testcase to help testing Hachoir
- First steps to support containers: experimental API that allows a parser to create a stream from a fragmented content, so that embedded data like files in a file system can be parsed.
News between 0.4 and 0.3
User side:
- New user interface using Urwid
- Metadata: read EXIF and IPTC in a JPEG picture, extract codec names from AVI and WAV, compute audio/video duration (MP3, WAV, AVI, etc.), compute bit rate (MP3 and WAV)
- New parsers: Ogg, Matroska, IPTC (metadata in JPEG), Photoshop Metadata (in JPEG)
- Able to catch and fix some errors => it's possible to parse some fields of a file when the parser is buggy or the file is invalid or truncated
Developer side:
- Parser has optional tags like file extensions, MIME types and description
- Remove libmagic/magic fallback code => a parser is know able to check if a stream looks valid or not
- Smarter API for FieldSet and Parser
News between 0.2 and 0.3
For users:
- Metadata extractor is now really interresting: able to document title, sound sample rate, video codec, image size and compression method, etc.
- Small fixes for Egg packaging (it's now possible to run an egg in user directory)
- New parser: ID3v2, JPEG, EXIF, XCF, WAV, RPM, PYC, ZIP
- Better parser: JPEG (now able to read picture size), ID3v1 and ID3v2, RPM, PNG, AVI
- Able to detect console charset
For developers:
- Reorganize directories, rename "libhachoir" to "hachoir" (it's now more "pythonic")
- Field Enum now accepts any other class as input (use adapter design pattern), eg. Enum(String(self, "tag"), self.tag_to_desc)
- Create field types: PascalString?8, PascalString?16 and PascalString?32
- Check field maximum size
- Write InputStream.searchBytesLength() method
- Cleanup code: remove useless files/modules, fix syntax using pychecker and pylint
News between 0.1 and 0.2
- Hachoir rewritten from scratch
- Project splitted in library and user interfaces
- Parser are now 100% "lazy": size, number of fields, value and description are created upon request
- "Lazy" design should make Hachoir really faster and have smaller memory footprint
- Rewrite API: better class/function name, really "object oriented" ("FieldSet?" inherits from "Field" class)
- Smarter parser syntax
- Event system: possible to watch many events with local or global scope
- Address and size are written in bits and not in bytes, and field address is relative to its parent
- New tool: metadata extractor
- New parsers: Microsoft Office document (only first layer, OLE object)
- Cute API to access field informations and to explore the field tree (eg. "/header/width" or "../size")
- Documentation and unit tests
Website changelog (v0.6 .. v0.7)
- 1st January 2007
- Creation of hachoir-subfile program
- Add 7-zip parser from Olivier SCHWAB
- 29 December 2006
- hachoir-wx version 0.1 is released
- 26 December 2006: Merry Christmas and Happy Hanukkah :-)
- Add Audio Interchange Format File (AIFF) parser
- Add MIDI audio parser
- Add Linux swap file parser
- Add WMF picture parser (also support EMF picture)
- Add Real audio parser (by Mike Melanson)
- 18 December 2006:
- Add Truevision Targa Graphic (.tga) parser
- Add Real Media (.rm) parser
- 15 December 2006:
- hachoir-parser 0.7 release (download and changelog)
- hachoir-urwid 0.7 release (download and changelog)
- hachoir-metadata 0.7 release (download and changelog)
- 12 December 2006:
- hachoir-core 0.7 release (download and changelog)
- hachoir-wx now have lazy hex view
- Add ReiserFS v3 parser
- Fix problem of invalid unicode filenames
- 18 November 2006:
- hachoir-core now supports decompression
- Add Ogg/Vorbis and Ogg/Theora parsers
- 11 November 2006:
- Rename hachoir component to hachoir-core
- Rewrite hachoir-core documentation, see wiki:hachoir-core/API?
- Add Java classes parser written by Thomas de Grenier de Latour (TGL)
- 6 November 2006:
- hachoir (core) 0.6.1 is out: bugfix release
- New parser: FLV video
- Move most experimental tools (hachoir-fuse, hachoir-strip) to hachoir-tools subversion repository
- Create tool swf_extractor.py (to extract JPEG picture and MP3 files from SWF files) and flv_extractor.py (extract sound track of a FLV video)
- 3 November 2006:
- hachoir-parser 0.6.1 is out: fix EXIF parser
- New parser: SWF (Flash) file
- Gentoo packages for Hachoir 0.6 are available. Hachoir 0.6 will be integrated in Gentoo ;-)
- 29 October 2006: Release 0.6 of hachoir, hachoir-parser, hachoir-metadata and hachoir-urwid
- 23 October 2006:
- Cyril Zorin started to work on hachoir-wx, first GUI based on "new" Hachoir API (last GUI was only for Hachoir 0.1, and API was rewritten from scratch for Hachoir 0.2)
- 17 October 2006:
- Create hachoir-fuse
- Split Hachoir into many subversion repository (hachoir-parser, hachoir-urwird, hachoir-metadata, etc.). It's would be easier to release bugfix or new release of each component.
- 30th September 2006:
- Add ZSNES save parser from Jason Gorski
- Add 3DO parser (3D model) from Cyril Zorin
- Add Spider-Man video parser from Mike Melanson
- Add basic TIFF image parser and backport ELF parser from Hachoir v0.1
- Metadata: add ASF metadata extractor, read comments in JPEG
- Add array() method to FieldSet
Website changelog (v0.5 .. v0.6)
- 25th September 2006:
- Add new parsers:
- Abstract Syntax Notation One (ASN.1)
- Basic MPEG video parser (just work on some MPEG version 2 files)
- Tcpdump file: Ethernet, IPv4, ARP, ICMP, TCP (and TCP options), UDP
- Add demonstration script: bin_strip.py which removes producer informations, timestamps, metadata, useless padding, etc.
- Add new parsers:
- 12th September 2006:
- Implementation of field set editor able to: edit field value, insert new field, remove a field
- Editor supports type: bits (RawBits, Bit, etc.), bytes (Bytes, PaddingBytes, etc.), integers (Int8, UInt16, etc.) and string (CString, PascalString?8, etc.). Default wrapper is read only.
- Get more information on: Hachoir editor page
- 7th September 2006:
- Replace charset "ISO 8859-*" by "ISO-8859-*" to make Hachoir able to run on IronPython (v1.0)
- xorAxAx and haypo fixed a bug in pypy (about charset lookup) and now Hachoir (svn 872) works on pypy (revision 32030)!
- Also write script/patch to make Hachoir (svn 940) able to run on Python 2.2, Python 2.3 and Jython 2.2
- Get more information about compatibility
- 4th September 2006:
- Add support for piped input. In other words:
- Data are cached to allow backward seeking and data in cache are discarded automagically
- The core try to do the most without knowing the size of the stream
- Metadata: setting multiple value to mono-valued field generate a warning and no more an exception
- urwid: able to switch between human and real display of an integer (Enum and text_handler)
- New Benchmark class which automatically compute number of calls and can display progress
- Improve "external links": (in urwid) Remove 'f' key, 'space' is enough.
- I18n of Hachoir:
- Most strings are now Unicode string and not byte string
- Use gettext (using "_" alias) and ngettext (singular/plural form) to translate text
- Hachoir scripts, urwid interface, metadata extractor and most kernel functions are translated in french
- ID3v2: supports picture in v2.3.0 ("APIC"), don't parse compressed field, safer charset getter code
- MPEG audio: detect more format errors (invalid bit rate or layer)
- Autofix feature is now optional (can be disabled)
- Remove on old and useless dependency: python-xml!
- Rewrite Python PYC parser: supports all versions (python 1.5 to 2.5), smaller path, basic fields (integer, string, etc.) have value, add support of binary float/complex and sets
- Small bugfixes: Fix makePrintable(): escape code 127 ("\x7f"), fix testcase for Windows (open files in binary mode), don't use SIGWINCH under Windows, INSTALL file is now synchronized with wiki page, Field description creation catchs exceptions, String: invalid length raise an exception (was an assertion)
- Add support for piped input. In other words:
- 24th August 2006:
- Improve (rewrite) autofix feature: it's now optional and it can fix most parser errors when at least one parent size is know
- Rewrite GenericString class: support UTF-16 and UTF-32 with BOM, fix computed length of the string
- hachoir-urwid: new option --force-mime, hachoir-metadata: new option --bench=RUNS
- MPEG audio (MP3) parser: support padding between frames, detect more format errors and can guess if the file is CBR (constant bit rate) or VBR (variable bit rate)
- ID3v2 parser: don't parse compressed chunks, catch error on charset, supports APIC picture
- Metadata RIFF: Fix to support video without sound
- 23th August 2006: Version 0.5.2 (fix setup.py)
- 20th August 2006: Version 0.5.1 (fix for 64-bit CPU)
- 19th August 2006: Version 0.5
Website changelog (v0.4 .. v0.5)
- 15th August 2006:
- Write StaticFieldSet for a more compact syntax.
- Create Bytes type, and fix use of RawBytes, Bytes and String
- Rewrite RIFF (AVI video, WAV audio and CDA) parser.
- Fix all charset issues.
- Fix display of truncated strings/rawbytes.
- 9th August 2006: Add program "hachoir-grep" (search a text pattern in a binary file, or just list all strings). String now returns Unicode is most cases (returns string on error or if charset is not specified). Add minimum size to parser to help parser guess.
- 8th August 2006: Merge AVI and WAV parsers into "RIFF" parser and add support of CDA file
- 6th August 2006: Hachoir project moved to http://hachoir.org/
- 5th August 2006: Add ASF parser (WMV video and WMA audio)
- 3rd August 2006: Add parser for Quicktime Movie/ISO MPEG4 video
- 1st August 2006:
- Creation of internal links: link a field to another
- Split 'hachoir' script into hachoir-urwid (new options: --preload, --path), hachoir-metadata (new option: --mime) and hachoir-console (many new options). hachoir-metadata now accepts multiple file names.
- Rewrite metadata extractor code. Metadata now accepts multiple values for some informations (eg. comments), information have priority (so it's easy to filter informations), better information organisation, etc. Add endian to metadata.
- Add Matoska metadata extractor (new mkvinfo ? :-))
- Creation of the Hachoir testcase
- 15 july 2006: New parsers: FAT (12/16/32), Sun/NeXT audio and ISO 9660
- 11 july 2006: Hachoir 0.4.0 is out
Website changelog (v0.3 ... v0.4)
- 25 june 2006: Hachoir is able to fix some errors during parsing, so it's possible to open buggy files !!!
- 20 june 2006: Don't use libmagic or magic_fallback to detect file type. Each parser has now it's own method to match a file.
- 16 june 2006: Metadata do also read Exif informations from JPEG picture
- 15 june 2006: Add Ogg parser from Julien Muchembled
- 14 june 2006: Metadata are better and bigger. Last changes: display file format version (some formats like JPEG), display AVI and WAV codecs name, compute music duration of a MP3, display bit rate of MP3 and WAV
- 8 june 2006: Second contribution! Julien Muchembled wrote a parser for Mastroka video ;-)
- 30 mai 2006: Hachoir 0.3.0 is out
Website changelog (v0.0 ... v0.2)
- 8 mai 2006: Hachoir 0.2.0 is out and packaged in an egg :-)
- 29 april 2006: Hachoir events do support global listener (can watch event from any field set) ; found a solution to "random access" format like Word document, EXT2 file system, etc. (allow to replace a field with other fields and emit signals for that) ; write XML export function (and XSLT to generate HTML page)
- 26 april 2006: Website is now complete. I moved all webpages from old website and wrote new pages
- 26 march 2006: Write new interesting parsers: tcpdump (network: IP, TCP, ARP, ICMP, UDP) and compiled python script (.pyc file)
- January-march 2006: I'm working on a rewrite of Hachoir, see Hachoir yield
- 28 december 2005: End of migration of all plugins to the new API version. So I release a news version :-)
- 27 december 2005: Creation of chunk types EnumChunk? and BitsChunk? which are:
- associate a number to a string, useful in many binary file formats
- allow to work with bit fields, useful for "flags" fields
- 26 december 2005: I try to freeze the API and make it more flexible. I also port filter to OnDemandFilter?: new version of the old Filter class which read data when data are read. That allow to open very big file (ex. ext3 file partition of 9 GB).
- 26 december 2005: Release version "2005-12-26", Get more informations on this version (on Berlios).
- 20 october 2005: Creation of the project.