It's not a simple task to categorize files:
- Category of what? File format or file content?
- For a video mixing image, sound and subtitle: category of the video codec, of the image, sound or the subtitle?
Ontology may help us?
Some large content category:
Some codecs:
- Charset (UTF-8, UTF-16LE, ISO-8859-1, ASCII, ...)
- Compression: deflate, bzip2, Huffman coding, Fourier transform, ...
- Muxer: MPEG video, AVI video, FLV video, MPEG audio, ...
Container file format:
- Matroska: video, sound, subtitle, metadata, but also "attached files" (any type)
- Ogg: audio (vorbis), video (theora), metadata
- RIFF: audio (WAV), video (AVI); AVI can contain any type of video and audio stream