H.264 for image compression
Recently Google has presented the WebP initiative. The initiative proposes a substitute of JPEG for image compression using the intra frame compression technique of the WebM project (Codec VP8).
Not everybody know it but also H.264 is very good at encoding still picture and differently from WebM or WebP the 97% of PC are already capable to decode it (someone has said ‘Flash Player’ ?).
The intra-frame compression techniques implemented in H.264 are very efficient, much more advanced than the old JPG and superior to WebP too. So let’s take a look at how to produce such file and the advantage of using it inside Flash Player.
JPG image compression
JPG is an international standard approved in 1992-94. It has been one of the most important technology for the web because without an efficient way to compress still pictures the web would not be what it is today. JPEG is usually capable to compress image size 1:10. The encoder performs these steps:
1. Color Space convertion from RGB to YCbCr
2. Chroma sub sampling, usually to 4:2:0 (supported also 4:2:2 or 4:4:4)
3. Discrete Cosine Transform of 8×8 blocks
5. Entropy Coding (ZigZag RLE and Huffman)
The algorithm is well known and robust and is used in almost every electronic device with a color display, but obviously in the last 15 years the scientists have developed more advanced algorithms to encode still pictures. One of this is JPEG2000 which leverages Wavelets to encode picture. But the problem of improving intra frame compression is very important also in video encoding because this is the kind of compression used for Keyframes. So H.263 before and H.264 after proposed more optimized ways to encode a single picture.
H.264 intra frame compression
H.264 contains a number of new features that allow it to compress images much more efficiently than JPG.
New transform design
Differently from JPG, an exact-match integer 4×4 spatial block transform is used instead of the well known 8×8 DCT. It is conceptually similar to DCT but with less ringing artifacts. There is also a 8×8 spatial block transform for less detailed areas and chroma.
A secondary Hadamard Transform (2×2 on chroma and 4×4 on luma) can be usually performed on “DC” coefficients to obtain even more compression in smooth regions.
There is also an optimized quantization and two possible zig-zag pattern for Run Length Encoding of transformed coefficients.
H.264 introduces complex spatial prediction for intra-frame compression.
Rather than the “DC”-only prediction found in MPEG2 and the transform coefficient prediction found in H.263+, H.264 defines 6 prediction directions (modes) to predict spatial information from neighbouring blocks when encoded using 4×4 transform. The encoder tries to predict the block interpolating the color value of adiacent blocks. Only the delta signal is therefore transmitted.
There are also 4 prediction modes for smooth color zones (16×16 blocks). Residual data are coded with 4×4 trasforms and a further 4×4 Hadamard trasform is used for DC coefficients.
A new logarithmic quantization step is used (compound rate 12%). It’s also possible to use Frequency-customized quantization scaling matrices selected by the encoder for perceptual-based quantization optimization.
Inloop deblocking filter
An adaptive deblocking filter is applied to reduce eventual blocking artifacts at high compression ratio.
Advanced Entropy Coding
H.264 can use the state of the art in entropy coding: Context Adaptive Binary Arithmetic Coding (CABAC) which is much more efficient than the standard Huffman coding used in JPG.
JPEG vs H.264
The techniques used in H.264 double the efficiency of the compression. That is, you can achieve the same quality at half the size. The efficiency is even higher at very high compression ratio (1:25 +) where JPG introduces so many artifact to be completely unusable.
This is a detail of a 1024×576 image compressed to around 50KBytes both in JPG (using PaintShopPro) and H.264 (using FFMPEG). The picture has a x2 zoom to better show the artifacts:
I have estimated a reduction in size of around 40-50% at the same perceived quality, especially at high compression ratios.
WebP vs H.264
WebP is based on the intra frame compression technique of the codec VP8. I compared H.264 with VP8 in this article. VP8 is a good codec and its intra frame compression is very similar to H.264. The difference is that VP8 does not support the 8×8 block transform (which is a feature of H.264 High profile) and can only encode in 4:2:0 (H.264 support 4:4:4). So both should have approximately the same performance at the common (in 4:2:0). The problem of WebP is the support which is now almost zero while H.264 can be decoded by Flash (97% of desktop + android + rim) and also by iOS devices (via HTML5).
How to encode and display on a page
Now let’s start to encode pictures in H.264. The container could be .mp4 or .flv. FLV is lighter than .mp4 but .mp4 has far more support outside Flash. This is the command line to use with FFMPEG:
ffmpeg.exe -i INPUT.jpg -an -vcodec libx264 -coder 1 -flags +loop -cmp +chroma -subq 10 -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -flags2 +dct8x8 -trellis 2 -partitions +parti8x8+parti4x4 -crf 24 -threads 0 -r 25 -g 25 -y OUTPUT.mp4
The -crf parameter changes the quality level. Try with values from 15 to 30 (the final effect depends by frame size). You can also resize the frame prior to encode using the parameter -s WxH (es: -s 640×360).
To display the picture encoded in H.264 you can use this simple AS3 code:
var nc:NetConnection = new NetConnection(); nc.connect(null); var ns:NetStream = new NetStream(nc); video.attachNetStream(ns); video.smoothing = true; nc.client = this; ns.client = this; ns.play("OUTPUT.mp4"); stage.scaleMode = "noBorder";
The Advantage of using Flash for serving picture in H.264
The main advantage of using H.264 for pictures is in the superior compression ratio. But it is not practical in an every day scenario to substitute every istance of the common <img> tag with an SWF.
However there’s a kind of application that can have enormous benefits from using this approach: the display of big, high quality copyrighted pictures. Instead of access low quality, watermarked JPG, it could be possible to server such big, high quality pictures as H.264 streams from a Flash Media Server and protect the delivery using RTMPE protocol and SWF authentication. On top of that, for a Bullet-proof protection, you could even protect the H.264 payload encrypting the content with a robust DRM like Adobe Access 2.0 . Not bad.