In the previous post of this 2 parts series, I have analyzed the technical features of the codec VP9 and concluded that, technically speaking, VP9 has the basis to compete with HEVC in terms of encoding efficiency.
But, you know, theory is a different thing than reality and in video encoding a big part of the final efficiency is in the encoder implementation more than in the codec specification. In this regard VP9 is not an exception and what I see from my tests is that vpxenc (the open source, command line encoder provided by Google) is not yet fully mature and optimized for every scenarios. I’ll discuss about this latest distinction more over.
Video Quality
VP9 specification has many features that can be used to enhance perceptual-aware encoding (like “segmentation”, to modulate quantization and filters inside frames according to perception of different areas of each frame). But those features are not yet used in vpxenc and this is clearly visible in the results.
At the beinning of 2015 I evaluated the performance of several H265 encoders for my clients and published a quick summary of the advantages and problems I found in (that time) HEVC encoders compared to optimized H264. The main problem that emerged in that evaluation was the inefficiency of “Adaptive Quantization” and other psycovisual techniques implemented in the encoders under test. The situation has partially changed for HEVC encoders during last year (thanks to better psycovisual encoding, especially for x265) but grain and noise retantion, especially in dark areas, is always a challenge for codecs exploiting big “transformations” like H265 and, indeed VP9.
Vp9 today shows the same inefficiencies of HEVC 1 years and half ago. It is quite good in handling motion related complexity, thanks to advanced motion estimation and compensation and reconstructs with high fidelity low and medium spatial frequencies, but has difficulties in retaining very high frequencies. Fine film grain disappears even at medium bitrates and the “banding” artifact is very visible in flat areas, gradients and dark areas even at high bitrates. In this regard H264 is still much better, at least at medium-high bitrates. Those kinds of artifact are quite common on Youtube because they are using now VP9 everytime they can, so try by yourself a 1080p or 2160p video on Chrome and take a look at gradients and shadows.
The sad thing is that common quality metrics like PSNR, SSIM (but also the more sofisticated VQM) are more happy with a flat encoding than with a psyco-visually pleasant, but not exact – encoding, and at the end, VP9 may be superior in PSNR or SSIM to H264/H265 even in a comparison like that of Picture 2 below where is very evident the banding or “posterization” effect.
Picture 1. H265 vs VP9 vs H264 – 1080p @2Mbps – click to enlarge
Picture 2. VP9 vs H264 – 1080p @2Mbps – click to enlarge
VP9 profile 2 – 10bit per component
Until now I’ve spoken about traditional 8bits/component encoding in H264, H265 and VP9. But vpxenc supports also a 10bits per component encoding known as VP9 profile 2.
Even if your content is at 8bit and everything remains BT.709 compliant, several studies has demonstrated that 10bit encoding is always capable of better quality/bitrate ratios thanks to higher internal accuracy. In particular the benefits are well visible in gradients and dark areas’ accuracy. See this example of VP9 8bit vs 10bit:
Picture 3. VP9 (8bit) 1080p@2Mbps vs VP9 (10bit) 1080p@1Mbps – click to enlarge
In the picture above we can see the better rendering of soft gradients when encoding at 10bits even if the source is 8bits. Grain (high freq, low power signal) is still not retained compared to the source but banding is pretty much reduced. Note also that in the case of VP9 profile 0 we need to increase the bitrate well above 3Mbps to have a good encoding of gradients (for 1080p) while at only 1Mbps the result is in this case sufficient when using profile 2.
The superiority of 10bits encoding has been always valid also for H264 (high10 profile), so why 10bits have started to gain momentum only with HDR and not before ?
The answear is “lack of players” on consumer’s devices. Let’s remember that H264 has become relatively early the standard in internet video only because Adobe decided to insert (at it’s own expense) a decoder inside Flash Player 9 (2007). This enabled a billion desktops to playback baseline, main and high AVC profile. Few know that originally it should support also high10 but a bug ruined the opportunity to actually use this function.
Apart this missed opportunity, H264 decoders on modern browsers, mobile devices, TVs, STBs are not capable to decode H264 high10 profile and the same is true for VP9.
Where is VP9 available now ?
Today VP9 is supported in lastest Chrome, Firefox, Opera (and Edge in preview) browsers on desktop (PC and Mac) and is supported in Android from version 4.4 on (software or hardware decoding depending by device). It is also available on an increasing number of Connected TV, but all the current (significative) decoders support only VP9 in mode 0, so 8bit.
The same problem is true for H265. On the mobile devices that support it, you can only deliver 8bit H265, but in this case it is also true that the large majority of 4K TVs support HEVC main10 profile as well.
So, when is convenient to use VP9 ?
The problem of “banding artefact” is directly proportional to the size of the display. It is irrelevant on small displays like that of smart phones and tablets. On laptop it starts to become visible and is pretty bad on big TVs.
So, concluding, I think that today VP9 is an interesting option for everyone who wants:
– The maximum quality-bitrate ratio on desktop even with some compromises in terms of quality. HEVC decoding will probably not appear on desktop for a long time, so VP9 is the only viable improvement over H264. The use case of live streaming can better fit the compromises.
– High efficiency on Android with a wide support base (Android >4.4). On an old, 100$ Android Phone I have, VP9 decoding works and HEVC not. Interesting option for markets of developing countries when bandwidth is scarce and Android has a bigger base than iOS.
If the current situation doesn’t change I doubt that players like Netflix will deliver high quality content on Desktop or TV using VP9 in profile 0, especially for 4K. And infact David Ronca of Netflix has said that they are evaluating VP9 especially to lower the level of access for mobile devices (they already use HEVC for HDR-10).
But fortunately the scenario is probably about to change quickly if it’s true that Youtube is planning to deliver HDR (=10bits) with VP9 during summer. This means that TVs with Vp9 profile 2 decoding capabilities are becoming a reality and this should open the way also for profile 2 on desktop browsers. In this case (and I’m optimistic), VP9 has really good chances to definitively become the successor of H.264 at least for Internet Video on Desktop and Android.
Remain to see what Apple will decide to do. In the while I’m starting to push VP9 in my strategies because Indeed I think that their choices are irrelevant. If we want to optimize a video delivery service it is increasingly clear that we will have to optimize for all 3 codecs.
hello!
i wonder ,can i with ffmpeg convert an URL from rtsp format to rtmp format??
Fascinating article! This is the first time I’ve seen the tip about 10-bit encoding. I’ve been wondering, what matters more in creating top-quality VP9 encodes: extra-thorough motion searching, trying many different transforms, or a combination? I’m trying new algorithms that use the GPU so complexity doesn’t matter much.
I think the most important part is rate control and adaptive quantization. The other parts like motion estimantion and compensation and transmors arrangment are more complex now but somewhat more standardizable. A smart rate control can on the contrary be more unconventional but assures more gain
Would that include checking which parts of the scene need more bits and using VP9’s segmentation to separate the scene into subject vs background? Kinda funny that you can do that, since people say VP9 has no adaptive quantization.
I was also wondering about a hard upper limit on bitrate. The encoder might create a bit budget and allocate a large amount to a keyframe, then divide the rest equally between inter frames.
Banding like this often happens due to intra coding – intra prediction first introduced in h.264 is very good at creating banding and filtering away film grain. This can be fixed easily with introducing extra RD cost for intra-coding – you’ll lose up to 0.5 dB PSNR, but it’ll look a lot better. I and my colleague implemented that a long time ago in a well known commercial h.264 encoder and later it was reverse engineered by x264 devs, so I’m pretty sure porting x264 psychovisual tweaks to x265/VP9 will solve most if not all the issues you’re complaining about