In the first post of this mini serie I have analyzed the technical features of H.265 (aka HEVC) compared with the good old H.264 (aka AVC). Summarizing, HEVC pushes the traditional block-based video encoding paradigm to higher levels of efficiency (and also complexity from an encoding/decoding p.o.v.) thanks mainly to:
– variable size transforms (from 4×4 to 32×32)
– quad-tree structured prediction areas (from 64×64 to 4×4)
– candidate-list-based motion vector prediction
– many intra-frame predictions modes
– higher-accuracy filters for motion compensation
– optimized deblocking, SAO filtering, cabac, etc…
It’s interesting to note that, compared to any other previous step from H.261 to H.264, with H.265 we have a considerable improvement not only (or mainly) in inter-frame compression domain but in intra-frame compression as well. A consistent amount of data in H.264 streaming is today concentrated in i-frames and this is because intra-frame compression is considerably less “evoluted” compared to inter-frame where, for example, b-frames help a lot in compression. H.265 introduces a strong improvement in block compression (in any kind of frame) thanks to variable size transforms. The possibility to use smaller transforms for impulsive signals and bigger transforms for stationary signals (smooth areas in case of pictures) is not new in signal-processing discipline and is used for example in AAC and many other codecs. Variable size transforms increase compression efficiency but introduce also some new challenges…but let’s procede one step at a time.
Video encoding is a complex problem that is highly dependent on the content. It is well known that a low motion scene with static background and bright lights can be compressed much more than a high motion, dark action scene with most of the picture that is moving. So what are the most difficult scenes/situations that a modern codec like H.264 has to cope with ? Even an efficient encoder may still find difficulty in compressing:
– detailed keyframes: without references to count on (and with not so efficient intraframe prediction), compressing keyframe is still difficult especially when they are features-rich (ex: a forest). If the keyframe is at the beginning of a quiet scene, the high efficiency of motion predition and compensation on low motion allows for overall efficient compression (most of the data can be allocated on the keyframe), but a sudden increase in complexity (motion) during the GOP can easily push an encoder to crisis.
– high motion with “crisp” picture: predict high complexity motion is quite difficult in itself. Mix this with high spatial complexity and you will have a consistent spike in bitrate and/or an increasing amount of artifacts.
– slow motion in dark areas: encoding dark areas is challenging because eyes are more sensible to details in dark than in full light but if you add slow motions of textured objects or smoke or small changes in colors and shadows, it is quite easy to spot annoing artifacts even using adaptive quantizations or similar optimizations.
– noise/grain: noise is almost incompressible by definition (it’s random and “unpredictable” by nature). Fortunately eye is more sensible to grain and noise in specific areas of picture like flat areas and dark areas and less in bright and detailed areas so a smart encoder can move bit-budget where is more needed. Nonetheless it’s quite difficult di compress noisy content, especially noise in fast moving scenes. Compressed noise is easily spotted because creates ugly patterns at lower frequencies and interfere with motion estimation/compensation (“dragged” artifacts). Denoising is not always suitable and/or desired, and unfortunately noise modelling and reconstruction during playback continue to be an “option” in hevc specification (watch this experiment about syntetic grain reconstruction).
H.265 mitigates the fist two cases compared to H.264. As said above, it’s quite efficient in intra-frame encoding and so detailed area can be encoded well and also smooth areas and gradients. Even motion estimation and compensation is effective and so compared to H.264, H.265 is able to operate at much lower bitrates before the appearance of artifacts. Furthermore, the artifacts produced by H.265 are more “smooth” and the degradation of quality is more “armonious” and good looking even when encoding at very aggressive resolution/bitrate ratios.
However, every coin has a flip side, and the strength of H.265 may become a weakness when processing the last two problematic cases. Dark areas and noise/grain require a more accurate (not matematically but “perceptually”) retention of high frequencies and small changes in color levels. This is usually called psy-optimization of encoding. In H.264, that uses s small transform, is easier to turn a quantization error into features/details that are not identical to the original but perceptually “similar”. The error generated in the approximation of the original frequency domain is stopped by the small boundary of the transform and thus more controllable. In H.265 with bigger trasforms is much more complex to use this approach and new ideas have to be put on the table.
H.265 vs H.264 today
In the last years I have developed optimizations approaches that analyze the video specifically for complex sequences and optimize them (adaptive source filtering, adaptive encoding parameterizations, specific rate control optimizations). Today I’m working into porting such optimizations to H.265 and so I’m “playing” with several H.265 encoders (i.e.: Divx H.265, x265, f265, NTT H.265 enc)
For the reasons forementioned we are today (jan 2015) in a situation where a good H.265 encoder is superior to a good H.264 encoder in encoding feature-rich keyframes (and blocks in general) and high motion providing a much smoother degradation of quality over lower bitrates. But at the same time, a good H.264 is still able to provide the same quality or even better quality in dark areas and noisy/grainy pictures. When the playback is done on Mobile devices this is not much visible because of the high DPI, but on a big TV screen this is evident on complex sequences.
The picture below show you an examples of what I mean:
I’m not saying that H.264 “IS” better than H.265 but that today encoders show a not completely mature level of development. This is quite normal and expected, as in the past (2003-2005) it happened to H.264 compared to xvid or to the best MPEG2 encoders (especially when working at medium-high bitrates). The problem is present also in 4K, even if in this case it is slightly mitigated by pixel size. The necessity to offer a good quality even in complex situations force the content providers willing to stream in 4K to use higher bitrates than otherwise necessary. A partial way to mitigate the problem of dark areas is to use 10bit per color in compression instead of 8bit. The additional accuracy is usually able to provide a better perceptual quality. Also when encoding in H.264 the use of 10bit helped a lot but was almost impossible to use in production because of the lack of support in decoders.
Generally speaking, the quality we can achieve today with H.264 in 1080p @3-4Mbit/s can be matched (except for dark areas) by H.265 at around 2-2.5Mbit/s. But difficult areas are…difficult and this require much attention during compression. For example, my clients usually cannot accept “posterizations effects” and “banding artifacts” like the ones showed in the picture above, especially during full screen playback on big screens (eventually 4K TV sets).
Apart from the quality evaluation, the main problem of H.265 is the general availability of decoders today. For 4K streaming we can say that the majority of target devices (4K TV Set) are able to decode a main10 4K profile at least at 24-30Fps (but even 50-60Fps in most cases). Probably we will see soon HEVC also on iOS and Android because many SoC capable to decode HEVC are arriving on the market, but the situation is much problematic for the browsers. H.264 has started to spread the web only when it was supported by Flash Player in 2007 (and Adobe paid the license), now that Flash is out of the game the future of H.265 for the browser is much more uncertain. Google is pushing VP9 (free and already supported in Chrome) as the way to go for the browsers but I doubt that Firefox and IEx will support it and even if a next release of IE will support HEVC soon, an annoying fragmentation will continue to plague the video streaming over the Internet.
Fortunately the development of H.265 encoders is improving quite fastly. I’m planning to make the point on this topic every 6 months. Stay tuned.