hachoir-strip is an experimental program based on hachoir-parser and hachoir-editor: it removes "useless" informations from a file. Don't use it to create smaller file, you have better to recompress your data :-) hachoir-strip can be used if you would like to remove spy information which can be used to know the origin of a file.
Download
- Download hachoir-strip
- Depends on Python 2.4+ and hachoir-parser
- Read source code online
Examples
Our victim:
$ hachoir-metadata KDE_Click.wav.new Common: - Creation date: 2001-02-21 <== here they are - Producer: Sound Forge 4.5 <== spy informations :-) - MIME type: audio/x-wav - Endian: Little endian Audio: - Duration: 39 ms ...
Clean up the file:
$ hachoir-strip KDE_Click.wav [+] Process file KDE_Click.wav Remove field /info Remove 56 bytes (3.1%) Save new file into KDE_Click.wav.new $ hachoir-metadata KDE_Click.wav.new Common: - MIME type: audio/x-wav - Endian: Little endian Audio: - Duration: 39 ms ...
So hachoir-strip removed creation date (2001-02-21) and producer (software used to record/edit the sound: Sound Forge 4.5). The file is also 56 bytes smaller.
Supported format and what is removed
- AU audio: "info" (comment)
- JPEG picture: "psd" (Photoshop metadata) and "exif" (EXIF metadata)
- MPEG audio: "id3v1" and "id3v2" (ID3 metadata)
- PNG picture: "text[*]" (commands) and "time" fields (timestamp)
- RIFF container (AVI video, WAV audio): "junk" (padding), "nb_sample" (audio length), "info" (metadata), "index" (video index)
- TAR archive: informations about user (name, identifier), user group (name, identifier) and timestamp
Option --strip
You can select field types to remove using --strip:
- (default): remove all useless fields
- --strip=useless: remove really useless fields (eg. padding)
- --strip=metadata: remove metadata like ID3 tags and EXIF and IPTC metadatas
- --strip=index: remove video index
You can combine options with comma: --strip="useless,metadata".
See also
- jpegoptim: Program to recompress JPEG picture, gain is 5 to 50% in lossless mode (so can more in non-lossless mode)