Thoughts on VP8

As you know Google has released VP8 as open source. After the acquisition of On2 several rumors spreaded across the web about the reasons of the deal. During Google I/O the secret has been revealed. Google opens VP8 source code and offers its technology for free.

This is a very important move. Even if there are shadows on the operation because of possible patents infringements, the web needed an open source video codec. All the Internet is founded on open source technologies that every company can handle freely but this was only partially true for video codec.

On2 gave VP3 technology to the open source community several years ago (project Theora) but it is a “primitive” codec, much similar to H.263, that cannot compete with modern codecs like VC1 and H.264.

But what exactly is VP8 ? It is the last codec designed by On2, a company specialized in video codec design. The most known codecs produced by On2 are Vp3 (Theora), Vp6 (licensed by Adobe for the Flash Player 8 and beyond), Vp7 (licensed by Skype) and now VP8. Only few people knew the technical details of VP8 until now, and therefore the potentialities of this codec have always been a mistery regardless of the fact that On2 declared it to be far superior than H.264.

But now it’s possible to compare directly the quality of Vp8 and H.264 because Google has provided the community with technical specifications and both the reference encoder and decoder. In this article I want simply to compare briefly Vp8 technical specification with H.264, the current “state of the art” in video encoding. The following table summarize the most important points:

Frame, Block Transforms, color spaces and color depth

Vp8 operates only on 4:2:0 8bit per pixel YUV picture. H.264 can instead operate also on 4:2:2 and 4:4:4 10 bit per pixel (or above) with the most advanced profiles. As well as H.264, Vp8 subdivides the frame in blocks of 4×4 pixel and performes an integer based transform with an addictional transform of DC coefficients. H.264 offers also an optional 8×8 transfor, available in High profile, which enhances compression in flat zones and gradients.  An interesting feature of VP8 is the capability to handle different frame resolution in the same stream.

Intra frame compression

In intra frame compression, a picture is encoded only spatially using intra prediction, quantization and entropy coding. The intra prediction of Vp8 is almost identical to H.264 with several modes for 4×4 blocks and 16×16 macroblocks. Vp8 intra lacks the H.264 adaptive 8×8 transform mode.

Inter frame compression

In inter compression frames are compressed exploiting temporal redundancies between adjacent frames. VP8 has several macroblock configurations, very similar to H.264 modes but somewhat more limited. VP8 similarly to H.264 supports pixel, half pixel and quarter pixel accuracy in motion extimation. It uses a slightly more accurate, but also slower interpolation schema. Vp8 supports only P-frames (and a sort of Disposable frames), with up to 3 reference  frames in the past,  much less compared to h.264 which can use P-frames with up to 16 reference frames and weighted prediction. H.264 can also leverage on B-frames which are interpolated between a frame in the past and a frame in the future.

Interpolated Frames

Probably the most important difference with H.264 is the lack of  B-frames. This kind of frame is the most efficient for compression. Every time the motion is “easy” to predict, a b-frame is inserted to exploit temporal redundacies. B-frames can also be dropped to keep audio video sync in difficult scenarios.

Deblocking

Vp8 supports an in loop deblocking filter. It can be considered as comparable to H.264 deblocking but slightly less flexible (rough adaptivity).

Entropy coding

VP8 supports a binary arithmetic entropy coding similar to H.264’s CABAC.

Post processing

On2′ codecs traditionally support various post processing filtering in the decoder stage. For example VP6 supported many level of deblocking and deringing. This is an interesting feature of VP8 but it is not clear from the documentation how it is applyed. Standard H.264 does not support pre-defined post processing filter.

Comparison

H.264 offers less encoding techniques than VP8, but this leads to lower complexity in both encoding and decoding stage. VP8 seems to be designed to be simpler than H.264 and at the same time almost comparable.  The lack of B-frames is probably the most important difference between Vp8 and H.264. This can reduce efficiency of around 15-25%. Further more, the other small differences like less articulate motion prediction, less reference frames and the absence of 8×8 adaptive transformation can be accounted for a further small difference, let’s say 5%.
In any case, VP8 seems to be comparable with H.264 baseline (in which 8×8, b-frames and CABAC are disabled) with the addiction of a simil-CABAC, thus probably a +5/10% more efficient than standard H.264 baseline.

Conclusion

I think VP8 has a long and probably bright future. It is relatively younger than H.264, so both encoders and decoders have to be improved and optimized. On the other hand H.264 has been in the market for years but even now there are room for improvements and optimizations (see my last test of near HD quality @250Kbit/s).

In any case we definitely needed an open source codec because theora was too poor in performances. The difference in quality and efficiency may overshadow if compared to the advantage of using a royalty free codec, especially in specific scenarios. Probably it is not a problem for a big company to pay for the creation of encoders, decoders or H.264 videos, but in some cases the presence of royalty can be blocking even if they are very low.

Think about the case of the Flash Player, how could Adobe pay a fee for every Flash Player to include an H.264 encoder? I hope that the availability of free VP8 could open the doors for a renewed realtime video encoding API  in the next release of Flash Player.