Author Archive

H265 – part II : Considerations on quality and “state-of-the-art”

28 January 2015 1 comment

In the first post of this mini serie I have analyzed the technical features of H.265 (aka HEVC) compared with the good old H.264 (aka AVC). Summarizing, HEVC pushes the traditional block-based video encoding paradigm to higher levels of efficiency (and also complexity from an encoding/decoding p.o.v.) thanks mainly to:

– variable size transforms (from 4×4 to 32×32)
– quad-tree structured prediction areas (from 64×64 to 4×4)
– candidate-list-based motion vector prediction
– many intra-frame predictions modes
– higher-accuracy filters for motion compensation
– optimized deblocking, SAO filtering, cabac, etc…

It’s interesting to note that, compared to any other previous step from H.261 to H.264, with H.265 we have a considerable improvement not only (or mainly) in inter-frame compression domain but in intra-frame compression as well. A consistent amount of data in H.264 streaming is today concentrated in i-frames and this is because intra-frame compression is considerably less “evoluted” compared to inter-frame where, for example, b-frames help a lot in compression. H.265 introduces a strong improvement in block compression (in any kind of frame) thanks to variable size transforms. The possibility to use smaller transforms for impulsive signals and bigger transforms for stationary signals (smooth areas in case of pictures) is not new in signal-processing discipline and is used for example in AAC and many other codecs. Variable size transforms increase compression efficiency but introduce also some new challenges…but let’s procede one step at a time.

Difficult scenes

Video encoding is a complex problem that is highly dependent on the content. It is well known that a low motion scene with static background and bright lights can be compressed much more than a high motion, dark action scene with most of the picture that is moving. So what are the most difficult scenes/situations that a modern codec like H.264 has to cope with ? Even an efficient encoder may still find difficulty in compressing:

detailed keyframes: without references to count on (and with not so efficient intraframe prediction), compressing keyframe is still difficult especially when they are features-rich (ex: a forest). If the keyframe is at the beginning of a quiet scene, the high efficiency of motion predition and compensation on low motion allows for overall efficient compression (most of the data can be allocated on the keyframe), but a sudden increase in complexity (motion) during the GOP can easily push an encoder to crisis.

high motion with “crisp” picture: predict high complexity motion is quite difficult in itself. Mix this with high spatial complexity and you will have a consistent spike in bitrate and/or an increasing amount of artifacts.

slow motion in dark areas: encoding dark areas is challenging because eyes are more sensible to details in dark than in full light but if you add slow motions of textured objects or smoke or small changes in colors and shadows, it is quite easy to spot annoing artifacts even using adaptive quantizations or similar optimizations.

noise/grain: noise is almost incompressible by definition (it’s random and “unpredictable” by  nature). Fortunately eye is more sensible to grain and noise in specific areas of picture like flat areas and dark areas and less in bright and detailed areas so a smart encoder can move bit-budget where is more needed. Nonetheless it’s quite difficult di compress noisy content, especially noise in fast moving scenes. Compressed noise is easily spotted because creates ugly patterns at lower frequencies and interfere with motion estimation/compensation (“dragged” artifacts). Denoising is not always suitable and/or desired, and unfortunately noise modelling and reconstruction during playback continue to be an “option” in hevc specification (watch this experiment about syntetic grain reconstruction).

H.265 mitigates the fist two cases compared to H.264. As said above, it’s quite efficient in intra-frame encoding and so detailed area can be encoded well and also smooth areas and gradients. Even motion estimation and compensation is effective and so compared to H.264, H.265 is able to operate at much lower bitrates before the appearance of artifacts. Furthermore, the artifacts produced by H.265 are more “smooth” and the degradation of quality is more “armonious” and good looking even when encoding at very aggressive resolution/bitrate ratios.

However, every coin has a flip side, and the strength of H.265 may become a weakness when processing the last two problematic cases. Dark areas and noise/grain require a more accurate (not matematically but “perceptually”) retention of high frequencies and small changes in color levels. This is usually called psy-optimization of encoding. In H.264, that uses s small transform, is easier to turn a quantization error into features/details that are not identical to the original but perceptually “similar”. The error generated in the approximation of the original frequency domain is stopped by the small boundary of the transform and thus more controllable. In H.265 with bigger trasforms is much more complex to use this approach and new ideas have to be put on the table.

H.265 vs H.264 today

In the last years I have developed optimizations approaches that analyze the video specifically for complex sequences and optimize them (adaptive source filtering, adaptive encoding parameterizations, specific rate control optimizations). Today I’m working into porting such optimizations to H.265 and so I’m “playing” with several H.265 encoders (i.e.: Divx H.265, x265, f265, NTT H.265 enc)

For the reasons forementioned we are today (jan 2015) in a situation where a good H.265 encoder is superior to a good H.264 encoder in encoding feature-rich keyframes (and blocks in general) and high motion providing a much smoother degradation of quality over lower bitrates. But at the same time, a good H.264 is still able to provide the same quality or even better quality in dark areas and noisy/grainy pictures. When the playback is done on Mobile devices this is not much visible because of the high DPI, but on a big TV screen this is evident on complex sequences.

The picture below show you an examples of what I mean:


I’m not saying that H.264 “IS” better than H.265 but that today encoders show a not completely mature level of development. This is quite normal and expected, as in the past (2003-2005) it happened to H.264 compared to xvid or to the best MPEG2 encoders (especially when working at medium-high bitrates). The problem is present also in 4K, even if in this case it is slightly mitigated by pixel size. The necessity to offer a good quality even in complex situations force the content providers willing to stream in 4K to use higher bitrates than otherwise necessary. A partial way to mitigate the problem of dark areas is to use 10bit per color in compression instead of 8bit. The additional accuracy is usually able to provide a better perceptual quality. Also when encoding in H.264 the use of 10bit helped a lot but was almost impossible to use in production because of the lack of support in decoders.

Generally speaking, the quality we can achieve today with H.264 in 1080p @3-4Mbit/s can be matched (except for dark areas) by H.265 at around 2-2.5Mbit/s. But difficult areas are…difficult and this require much attention during compression. For example, my clients usually cannot accept “posterizations effects” and “banding artifacts” like the ones showed in the picture above, especially during full screen playback on big screens (eventually 4K TV sets).

Apart from the quality evaluation, the main problem of H.265 is the general availability of decoders today. For 4K streaming we can say that the majority of target devices (4K TV Set) are able to decode a main10 4K profile at least at 24-30Fps (but even 50-60Fps in most cases). Probably we will see soon HEVC also on iOS and Android because many SoC capable to decode HEVC are arriving on the market, but the situation is much problematic for the browsers. H.264 has started to spread the web only when it was supported by Flash Player in 2007 (and Adobe paid the license), now that Flash is out of the game the future of H.265 for the browser is much more uncertain. Google is pushing VP9 (free and already supported in Chrome) as the way to go for the browsers but I doubt that Firefox and IEx will support it and even if a next release of IE will support HEVC soon, an annoying fragmentation will continue to plague the video streaming over the Internet.

Fortunately the development of H.265 encoders is improving quite fastly. I’m planning to make the point on this topic every 6 months. Stay tuned.


Categories: HEVC

H265 – part I : Technical Overview

20 June 2014 1 comment

HEVC is among us. On January 25, 2013, the ITU announced the completition of the first stage approval of the H.265 video codec standard and in the last 1 year several vendors/entities have started to work on the first implementations of H.265 encoders and decoders. Theoretically HEVC is said to be from 30 to 50% more efficient than H.264 (especially at higher resolutions) but is it really that simple ? is H.264 so close to retirement ? This is what we will try to find. First of all let’s start with a technical analysis of H.265 compared to AVC and then, in the next blog post, we will take a look at the current level of performance that is realistic to obtain in today’s H.265 encoders.

H.265/HEVC – Technical Overview

This part assumes you are sufficiently familiar with the coding techniques inplemented in H.264/AVC (if you need to refresh your memory I suggest those posts: H.264 Part I, Part II).  HEVC re-uses many of the concept defined in H.264. Both are block based video encoding techniques so have the same roots and the same approach to encoding:

1. subdivision of picture in macroblocks, eventually sub-divided in blocks
2. reduction of spatial redundancy using intra-frame compression techniques
3. reduction of temporal redundancy using inter-frame compression techniques (motion estimation and compensation)
4. residual data compression using transformation & quantization
5. reduction of final redundancy in residuals and motion vectors transmission and signaling using entropy coding

HEVC can be seen as a strong evolution of AVC with some very important key features, a number of less important improvements and some simplifications.

Picture partitioning

Instead of 16×16 macroblocks like in AVC, HEVC divides pictures into “coding tree blocks” (CTBs). Depending by an encoding setting the size of the CTB can be of 64×64 or limited to 32×32 or 16×16. Several studies have shown that bigger CTBs provide higher efficiency (but also higher encoding time). Each CTB can be split recursively, in a quad-tree structure, in 32×32, 16×16 down to 8×8 sub-regions, called coding units (CUs). See the picture below for an example of partitioning of a 64×64 CTB (numbers report the scan order). Each picture is furtherly partitioned in special groups of CTBs called Slices and Tiles (see also Parallel processing)


CUs are the basic unit of prediction in HEVC. Usually smaller CUs are used around detailed areas (edges and so on), while bigger CUs are used to predict flat areas.

Transform size

Each CU can be recursively splitted in Transform Units (TUs) with the same quad-tree approach used in CTBs. Differently from AVC that used mainly a 4×4 transform and occasionally an 8×8 transform, HEVC has several transform sizes: 32×32, 16×16, 8×8 and 4×4. From a matematical point of view, bigger TUs are able to encode better stationary signals while smaller TUs are better in encoding smaller “impulsive” signals. The transforms are based on DCT (Discrete Cosine Transform) but the transform used for intra 4×4 is based on DST instead (Discrete Sine Transform) because several tests have evidenced a small improvement in compression. Transformation is performed with higher accuracy compared to H.264. The adaptive nature of CBT, CU and TU partitions plus the higher accuracy plus the larger transform size are among the most important features of HEVC and the reason of the performance improvement compared to AVC. HEVC implements a sofisticated scan order and coefficient signaling scheme that improves signaling efficiency. Note that unlike H.264 there’s no Hadamard nor 2×2 chroma (min chroma transform size is 4×4). HEVC drops also the support for  MBAFF or similar techniques to code interlaced video. Interlaced video can still be compressed but there’s no separation between fields and frames (only frames).


Prediction Units

We have introduced the new transform sizes just after the picture partitioning to exploit the analogy between CU and TU trees, but before transform and quantization there’s the prediction phase (inter or intra).
A CU can be predicted using one of eight partition modes (see picture below).


Even if a CU contains one, two or four prediction units (PUs), it can be predicted using exclusively inter-frame or intra-frame prediction technique, furthermore Intra-coded CUs can use only the square partitions 2Nx2N or NxN. Inter-coded CUs can use both square and asymmetric partitions. A number of other limitations are applied to simplify signaling. For example no 4×4 prediction is allowed in inter-prediction and 4×8 and 8×4 are allowed only in forward prediction (so not in b-frames). Tendentially inter-prediction stops at 8×8 level.

Intra prediction

HEVC has 35 different intra-prediction modes (9 in AVC). DC mode, Planar Mode and 33 directional modes. Like in AVC, intra prediction tries to recover information from surraunding blocks and works particularly well for flat areas. Intra prediction follows the TUs partition tree and so prediction modes are applied to 4×4, 8×8, 16×16 and 32×32 TUs.


Inter prediction

For motion vector prediction HEVC has two reference lists: L0 and L1. They can hold 16 references each, but the maximum total number of unique pictures is 8. Multiple instance of the same ref frame can be stored with different weights. HEVC motion estimation is much more complex than in AVC. It uses list indexing. There are two main prediction modes: Merge and Advanced MV. Each PU can use one of those methods and can have forward (a MV) or bi-directional prediction (2 MV). In Advanced MV mode a list of candidates MV is created (spatial and temporal candidates picked with a complex, probabilistic logic), when the list is created only the best candidate index is transmitted in the bitstream plus the MV delta (difference between the real MV and the prediction). On the other side, the decoder will build and update continuously the same candidate list using the exact same rules used by the encoder and will pick-up the MV to use as estimator using the index sent by the encoder in the bitstream.
The merge mode is similar, the main difference is that the candidates list is calculated from neighboring MV and is not added to a delta MV. It is the equivalent of “skip” mode in AVC.

Similarly to AVC, HEVC specifies motion vectors in 1/4-pel, but uses an 8-tap filter for luma and a 4-tap 1/8-pel filter for chroma. This is considerably better than 6-tap used for luma and 2-tap (bilinear) for chroma used in AVC. An increased sub-pixel filtering accuracy improves efficiency of estimation and picture “stability” but requires much more memory accesses and so processing power (with higher battery consumption) this is why H.265 doesn’t include an inter-estimation on 4×4 regions, limits 4×8 and 8×4 estimation to be uni-directional (forward prediction) and limit to 8×8 for bi-directional. HEVC supports weighted prediction for both uni- and bi-directional PUs (always implicit weights).

HEVC uses up to 16bit per MV so at quarte-pel accuracy this means a −8192 to 8191.75 rang (for luma) compared to −2048 to 2047.75 horizontally and −512 to 511.75 vertically in AVC (increased motion compensation accuracy fo 4K 8K resolutions).


Unlike h264 where deblocking was performed on 4×4 blocks, in HEVC deblocking is performed on the 8×8 grid only. This allows for parallel processing of deblocking (there’s no filter overlapping).  All vertical edges in the picture are deblocked first, followed by all horizontal edges. The filter is similar to AVC.


After deblocking there’s a second optional filter. This filter is called Sample Adaptive Offset, or SAO. Similarly to deblocking filter, it is applied in the prediction loop and the result stored in the reference frames list. The objective of the filter is to fix mispredictions, encoding drift and banding on wide areas subdividing the colors in “bands” and applying adaptive offset to them.

Entropy coding

In HEVC threre’s only CABAC for entropy coding. CABAC in HEVC is almost identical to CABAC in AVC with minor changes and simplifications to allow a parallel decoding.

Parallel Processing

Since HEVC decoding is much more complex than AVC, several technique to allow a parallal decoding have been implemented. The most important are: Tiles and Wavefront.
The picture is divided into a rectangular grid of CTBs (Tiles). Motion vector prediction and intra-prediction is not performed across tile boundaries.
With Wavefront Each CTB row can be encoded & decoded by its own thread. Multiple rows encoding / decoding are sincronized (entropy coding state) guaranting that each “wavefront” CTB is surrounded by specific CTB during encoding and decoding (see picture).



The adaptive subdivision of picture in prediction areas, the use of advanced intra-prediction, inter-prediction and bigger transform sizes can absolutely guarantee, in the long term, a considerably higher efficiency of HEVC compared to AVC. But the complexity of the encoding is really much higher. For example, consider that in AVC a macroblock of 16×16 could have only 2 possible sub-partitions: 16 4×4 sub-blocks, or 4 8×8 sub-blocks. Now the number of possible sub-splitting of a 64×64 CTU is exceptionally higher (65536). In AVC was simple to test what of the two configurations was better for compression, but now ? New techniques must be implemented to efficiently explore the quad-tree and avoid to test every configuration out of the possible 65536.
Like AVC before, HEVC is a big optimization challenge, but the potentialities are enormous. In the next blog post we will take a look at the state of the art in H.265 encoding in mid-2014  at the beginning of 2015.

Categories: HEVC

Future of video: 4K, DASH, HEVC

31 January 2014 3 comments

I must admit, I’m feeling very guilty. This is the only new post in more than 1 year. 2013 has been wonderful from a professional point of view and I have had very few moments, if any, to dedicate to the blog. But for 2014 there are too many interesting trends that I can’t neglect anymore and so I want to return speaking about video encoding, streaming and OTT technologies.

Infact, you know that there are three magic “words” that are outlining the future of video: 4K, HEVC and DASH.

So, as a 2014 new year resolution, I’m planning to speak about ideas and optimizations related to the “magic trio”.

4K or not 4K ?

The first trend is rapidly gaining its momentum. “4K” is on every insiders’ lips and the effort of Youtube, Netflix and others to offer quickly 4K content is also opening new opportunities for selling 4K TVs and Monitors.
I’m focusing part of my researches in finding specific optimizations for H.264 encoding of 4K content. Infact I think that apart from marketing buzz, 4K will be served first using the well known H.264.

There are sereval optimizations to explore for 4K: for example custom quantization matrix, bias toward the use of 8×8 transform, changes in psyco visual optimizations, to name a few. 4K also pushes the limit of H.264 for motion compensation and estimation (too long MVs) creating several efficiency problems. But if is useful to optimized an HD and FullHD stream, it is much more crucial to super optimize a 4K stream because the level of bitrate that we are speaking about is difficult to have in Internet or to have consistently.

ABR streaming can help here but not as usual. Who can accept to watch a 2.5Mbit/s 720p rendition on a 80” 4K display because of low bandwidth on peak times ? (it is the same experience as watching a 360p video on a 40” screen from 1.5 mt of distance, try and tell me) Who buy a 4K wants 4K, no compromise. Further more, as Dan Rayburn underlined, there are few economic reasons to offer 4K because 4K delivery costs 3-4 times Full-HD. This is why I think that optimization is now more important than ever.


HEVC has been finally ratified. Like in 2003, when H.264 was ratified, now the encoders are very raw and inefficient and a lot of work is to be done, but the potentialities are all there. Theoretically HEVC is said to be from 30 to 50% more efficient than H.264 (higher efficiency at higher resolutions). So it is not a mistery that 4K and H.265 are seen as the winning couple. But the increase in pixel to be processed (8x passing from 1080p25/30 to 2160p50/60) and the complexity of the new codec (approx. 10x during encoding compared to H.264) do not draw a simple scenario with increses in required processing power up to a 80x factor. But hey…we are now like in 2003, we have maybe 10 years ahead to squeeze the max out of H.265, and this is very exciting. In thee while, H.264 still have some room for improvements and for at least a couple years will continue to be the king on the hill.

I have started to play with HEVC and probably the amount of time I’ll dedicate to experiment will increase steadily during 2014. By now I have collected interesting results. The bigger Block Transforms (not only 4×4 and 8×8 like in H.264 but also 16×16 and 32×32) plus some advanced deblocking  and adaptive filtering are able to produce a much “smoother degradation” of quality when decreasing the bitrate, especially for high complexity scenes. On the other hand, the different handling of fine details is producing now less details retantion than H.264 and new approaches to psycovisual optimizations are all to be invented.

And VP9 ? Interesting technology, good potentiality. Will be successful? Hard to tell, until then I will continue to keep it under observation.


Last but not least there’s the new MPEG standard for ABR streaming MPEG DASH (Dynamic Adaptive Streaming over HTTP). HLS is spreading over various devices but at the same time the implementations are frequently bugged and without control. DASH on the other hand provides plenty of control and it is possible to change heuristic. This is very important to achieve an Higher-as-possible QoE (or QoS), a key factor in the future where CDNs’ cost per GB is flattening while viewers’ number and stream size/quality is increasing .

So stay tuned.

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part VI

19 October 2012 35 comments


PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

The fabulous world of FFmpeg  filtering

Transcoding is not a “static” matter, it is dynamic because you may have in input a very wide range of content’s types and you may have to set encoding parameters accordingly (This is particularly true for user generated contents).

Not only, the elaborations that you need to do in a video project may go beyond a simple transcoding and involve a deeper capacity of analysis, handling and “filtering” of video files.

Let’s consider some examples:

1. you have input files of several resolutions and aspect ratios and you have to encode them to two target output formats (one for 16:9 and one for 4:3) . In this case you need to analyze the input file and decide what profile to apply depending by input aspect ratio.

2. now let’s suppose you want also to encode video at the target resolution only if the input has an equal or higher resolution and keep the original otherwise. Again you’d need some external logic to read the metadata of the input and setup a dedicated encoding profile.

3. sometime video needs to be filtered, scaled and filtered again. Like , for istance, deinterlacing,  watermarking and denoising. You need to be able to specify a sequence of  filtering and/or manipulation tasks.

4. everybody needs thumbnails generation, but it’s difficult to find a shot really representative of the video content. Grabbing shots only on scene changes may be far more efficient.

FFmpeg can satisfy these kinds of complex analysis, handling and filtering tasks even without an external logic using the embedded filtering engine ( -vf ). For very complex workflows an external controller is still necessary but filters come handy when you need to do the job straight and simple.

FFmpeg filtering is a wide topic because there are hundreds of filters and thousands of combinations. So, using the same “recipe” style of the previous articles of this series, I’ll try to solve some common problems with specific command line samples focused on filtering. Note that to simplify command lines I’ll omit the parameters dedicated to H.264 and AAc encoding. Take a look at previous articles for such informations.

1. Adaptive Resize

In FFmpeg you can use the -s switch to set the resolution of the output but this is a not flexible solution. Far more control is provided by the filter “scale”.  The following command line scales the input to the desired resolution the same way as -s:

ffmpeg -i input.mp4 -vf  "scale=640:360" output.mp4

But scale provides you also with a way to specifing only the vertical or horizontal resolution and calculate the other to keep the same aspect ratio of the input:

ffmpeg -i input.mp4 -vf  "scale=640:-1" output.mp4

With -1 in the vertical resolution you delegate to FFmpeg the calculation of the right value to keep the same aspect ratio of input (default) or obtain the aspect radio specified with -aspect switch (if present). Unfortunately, depending by input resolution, this may end with a odd value or an even value witch is not divisable by 2 as requested by H.264. To enforce a “divisible by x” rule, you can simply use the emebedded expression evaluation engine:

ffmpeg -i input.mp4 -vf  "scale=640:trunc(ow/a/2)*2" output.mp4

The expression trunc(ow/a/2)*2 as vertical resolution means: use as output height the output width (ow = in this case 640) divided for input aspect ratio and approximated to the nearest multiple of 2 (I’m sure most of you are practiced with this kind of calculation).

2. Conditional resize

Let’s go further and find a solution to the problem 2 mentioned above: how to skip resize if the input resolution is lower than the target ?

ffmpeg -i input.mp4 -vf  "scale=min(640,iw):trunc(ow/a/2)*2" output.mp4

This command line uses as width the minimum between 640 and the input width (iw), and then scales the height to maintain the original aspect ratio. Notice that “,” may require to be escaped to “\,” in some shells.

With this kind of filtering you can easily setup a command line for massive batch transcoding that adapts smartly the output resolution to the target. Way to use the original resolution when lower than the target? Well, if you encode with -crf this may help you save alot of bandwidth!

3. Deinterlace

SD content is always interlaced and FullHD is very often interlaced. If you encode for the web you need to deinterlace and produce a progressive video which is also easier to compress. FFmpeg has a good deinterlacer filter named yadif (yet another deinterlacing filter) which is more efficient than the standard -deinterlace switch.

ffmpeg -i input.mp4 -vf  "yadif=0:-1:0, scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4

This command deinterlace the source (only if it is interlaced) and then scale down to half the horizontal and vertical resolution. In this case the sequence is mandatory: always deinterlace prior to scale!

4. Interlacing aware scaling

Sometimes, especially if you work for ipTV projects, you may need to encode interlaced (this is because legacy STBs require interlaced contents and also because interlaced may have higher temporal resolution). This is simple, just add -tff or -bff (top field first or bottom field first) in the x264 parameters. But there’s a problem: when you start from a 1080i and want to go down to an interlaced SD output (576i or 480i) you need an interlacing aware scaling because a standard scaling will break the interlacing. No fear, recently FFmpeg has introduced this option in the scale filter:

ffmpeg -i input.mp4 -vf  "scale=720:576:-1" output.mp4

The third optional flag of filter is dedicated to interlace scaling. -1 means automatic detection, use 1 instead to force interlacing scaling.

5. Denoising

When seeking for an high compression ratio it is very useful to reduce the video noise of input. There are several possibilities, my preferite is the  hqdn3d filter (high quality de-noising 3d filter)

ffmpeg -i input.mp4 -vf  "yadif,hqdn3d=1.5:1.5:6:6,scale=640:360" output.mp4

The filter can denoise video using a spatial function (first two parameters set strength) and a temporal function (last two parameters). Depending by the type of source (level of motion) can be more useful the spatial or the temporal. Pay attention also to the order of the filters: deinterlace -> denoise ->  scaling is usually the best.

6. Select only specific frames from input

Sometime you need to control which frames are passed to the encoding stage or more simply change the Fps. Here you find some useful usages of the select filter:

ffmpeg -i input.mp4 -vf  "select=eq(pict_type,I)" output.mp4

This sample command filter out every frame that are not an I-frame. This is useful when you know the gop structure of the original and want to create in output a fast preview of the video. Specifing a frame rate for the output with -r accelerate the playback while using -vsync 0 will copy the pts from input and keep the playback real-time.

Note: The previous command is similar to the input switch -skip_frame nokey ( -skip_frame bidir drops b-frames instead during deconding, useful to speedup decoding of big files in special cases).

ffmpeg -i input.mp4 -vf  "select=not(mod(n,3))" output.mp4

This command selects a frame every 3, so it is possible to decimate original framerate by an integer factor N, useful for mobile low-bitrate encoding.

7. Speed-up or slow-down the video

 It is also funny to play with PTS (presentation time stamps)

ffmpeg -i input.mp4 -vf  "setpts=0.5*PTS" output.mp4

Use this to speed-up your video of a factor 2 (frame are dropped accordingly), or this below to slow-down:

ffmpeg -i input.mp4 -vf  "setpts=2.0*PTS" output.mp4

8. Generate thumbnails on scene changes

The filter thumbnail tries to find the most representative frames in the video. Good to generate thumbnails.

ffmpeg -i input.mp4 -vf  "thumbnail,scale=640:360" -frames:v 1 thumb.png

A different way to achieve this is to use again select filter. The following command selects only frames that have more than 40% of changes compared to previous (and so probably are scene changes) and generates a sequence of 5 png.

ffmpeg -i input.mp4 -vf  "select=gt(scene,0.4),scale=640x:360" -frames:v 5 thumb%03d.png


The world of FFmpeg filtering is very wide and this is only a quick and “filtered” view on this world. Let me know in the comments or on twitter (@sonnati) if you need more complex filters or have problems adventuring in this fabulous world ;-)



PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)


Categories: ffmpeg, Video

Netflix – meditations on a video streaming giant

18 July 2012 7 comments

Netflix, during June, reached the record level of 1 Billion hours streamed in a month. It is an incredibly huge level of bandwidth, an impetuous and growing stream of bits that makes Netflix one of the TOP10 Internet bandwidth “consumer”. But how much does it cost to Netflix this huge stream ?

I remember an article of a couple years ago by Dan Rayburn in which he estimated an average cost of 3c$ per GByte, a low rate usually applied by CDNs to very large clients. In an article of 2011, Dan corrected the estimation discussing a more complex pricing model for such big players (a mix of per GB and per Gbit/s). The new estimation can, however, be approximated to 1.5c$/GB.

This level of pricing may seem very low and negligible in the overall Netflix’s business, but I think that the growing consumption due to the relatively high average of content streamed per user per month may be a problem for Netflix if not brought under control.

Let’s dig deeper in the numbers.

Let’s suppose that the average bitrate streamed to users is 2.4 Mbit/s (see this post in the netflix blog), this means that every hour of content requires in average 1080 MB (1GB).

If you multiply this for 1Billion hours you have 1 Billion GBs * 1.5c$ = 10M$ / month, 120 M$ per year.

Compared to the cost of CDNs of 2011, 2012 is around the double. This is caused by an increase in the number of clients but most of all by an increase in the average amount of data streamed per client. A wopping 90 minutes per day per user. I think that this may be considered near the maximum possible but a further increase to 120 minutes may be realistic in a worst case simulation. This would mean 160M$ per year.

With these premises it is not a surprise that Netflix is searching to control delivery costs creating their own, single purpose, CDN and optimizing encoding.

You know that I’m very sensible to encoding optimization. I have always stated that for this kind of business encoding optimization is of fundamental importance. I have already demostrated in the past that H.264 can be optimized much more then what players like Youtube, Netflix, Hulu, BBC  are doing today. Here I specifically addressed Youtube and Netflix.

Netflix could benefits of a 30% to 50% reduction in average bitrate consumption with a strong optimization of the entire encoding pipeline (plus eventually of the Silverlight player). This could mean savings for 60-80M$ per year and at the same time an improvement in the average quality delivered to client, a key feature in the increasingly competitive market of OTT video.

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part V

2 July 2012 29 comments


PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)


After almost one year from the starting post of this series dedicated to FFmpeg I have found some time to catch-up with this topic and revise/refresh the series. In this year a lot of things happened on the FFmpeg side (and not only), so I have corrected a lot of small errors and changes in syntax of commonly used commands. So this is also a good opportunity for you to refresh your knowledge about FFmpeg and the current state-of-the-art. Above you find the Index of the articles.

The most important changes are around parameters like -vcodec, -b, -ab, -vframes, etc… to avoid misunderstandings now a stream_identifier has been added to specify if the parameter is related to the audio or video track. In case of multiple AV tracks there also an optiona parameter to specify the track number. Take a look at the updates of PART II to have more informations about new syntax and obsolete parameters.

Another important change is realated to libfaac library which is now external. Read point 2 below to know about alternatives.

Last but not least, FFmpeg introduced the possibility to control directly the parameters of x264lib using the -x264opts command. Not for everyone but very useful when you want the control and performance of x264 and all the input and output options of FFmpeg.

Fifth Part – Advanced Usage

This fifth article wants to add more advanced use cases and usages to what was presented and discussed in the previous 4 parts. This article will be enriched in the next weeks and months to include even more advanced examples and use cases that can be solved with a smart use of FFmpeg. Good reading!

1. Optimize multi-pass multi-bitrates encoding

You know that encoding for dynamic streaming techniques (HDS, HLS, Silverlight) requires the renditions to have aligned keyframes and be CBR or capped VBR.
A neat trick to avoid the limit of fixed length GOPs is to assure a consistent alignment of keyframes across renditions reusing the same first pass statfile across renditions.

ffmpeg –i IN –pass 1 –an –vcodec libx264 –r 30 –b 1500k –bufsize 1500k –keyint_min 60 –g 120 –s 1280×720 –vpre slower_fastfirstpass OUT_1500.mp4

This command line is the first pass of the first rendition. The first pass generates a stat file for the second pass.

ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 1500k –bufsize 1500k –keyint_min 60 –g 120 –s 1280×720 –vpre slower OUT_1500.mp4

Instead of recreating a first pass stat file for the next renditions, you can use the previous simply launching the second passes of the next renditions

ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 1000k –bufsize 1000k –keyint_min 60 –g 120 –s 854×480 –vpre slower O_1000.mp4
ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 500k –bufsize 500k –keyint_min 60 –g 120 –s 640×360 –vpre slower O_500.mp4

Since the second pass is less accurate if it use a stat file generated with a too much different resolution and bitrate, may be better to use a rendition in the middle to generate the first pass and not the highest rendition.

2. AAC encoding

libfaac has been extracted from ffmpeg and is now an external library. There are two alternatives yet embedded inside ffmpeg: libvo_aacenc or the standard aac library.

ffmpeg input.mp3 -c:a libvo_aacenc -b:a 96k -ac 2 -ar 44100 output.aac

ffmpeg input.mp3 -c:a aac -strict experimental -b:a 96k -ac 2 -ar 44100 output.aac

I have tested both and it seems to me that libvo is the best alternative. It produces a sufficiently good AAC LC.
In a future article I’ll explore some alternative like encoding audio track externally and remux then with ffmpeg or mp4box.
This is a must go if you need the higher efficiency of HE AAC or HE AAC v2.

3. Joining video

Joining video is strangely a complex task with FFmpeg. A reader suggested this solution (via Steven’s Blog):

ffmpeg -ss 100 -t 10 -i in.mp4 -c copy -bsf h264_mp4toannexb 100.h264
ffmpeg -ss 200 -t 10 -i in.mp4 -c copy -bsf h264_mp4toannexb 200.h264
ffmpeg -i concat:”100.h264|200.h264″ -i in.mp3 -c copy out.mp4
The first two lines generate two h.264 elementary streams. The h264_mp4toannexb option is mandatory to be able to concatenate efficiently elementary streams at binary level.
The third line use the concat option to cancatenate the ES segments to form a new input.
I usually use mp4box for this kind of purpose and not FFmpeg.

4. Use an HLS stream as source

FFmpeg now supports also Apple HTTP Live Streaming as an input protocol. So it is really simple to acquire or repurpose an HLS streaming, simply specify the path to .m3u8 manifest.

ES: Do you want to stream an existing .m3u8 stream to Flash on the desktop using FMS (now AMS) ? Try this:
ffmpeg -re -i http://server/path/stream.m3u8 -c copy -f flv "rtmp://FMSserver/app/streamName live=1"

5. Record a stream endlessly rotating target file

Segmenting feature of FFmpeg can also be useful to create an endless recorder with rotating buffer. It can be done using the segment_wrap parameter that wraps around segment index once it reached a limit.

ffmpeg -i rtmp://INPUT -codec copy -f segment -segment_list out.list -segment_time 3600 -segment_wrap 24 out%03d.mp4
The previous commandline records endlessly the INPUT stream in a ring buffer formed by 24 chunk of 1hr video.Conclusionfollow me on twitter to know more about FFmpeg and video related topics (@sonnati).


PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)


Categories: Video

Adobe Media Server 5 announced

23 May 2012 5 comments

Adobe has announced that the new Adobe Media Server 5 (formerly Flash Media Server) will be available soon. The change in the name reflects the recent shift in strategy of Adobe [sarcastic mode on] that is running away from Flash as fast as possible to embrace more cool technolgies like, in this case, HLS [sarcastic mode off].

Irony apart, I’m very happy with this announcement because I have always stated that FMS could have interesting new possibilities supporting streaming for iOS with content protection. FMS supports HLS streaming since release 4.5, but this is not sufficient to keep the leadership.

I work for large media clients that need content protection and in the last 2 years I have seen such clients choose MS’s SmoothStreaming and PlayReady too much times because some independent vendors have been able to offer them native clients for iOS supporting PlayReady DRM inside HLS protocol.
So it’s impossible to remain on the technology and adoption edge if not supporting all the business needs for the most preminent mobile platform.

Now with AMS 5 we are able to use our preferite streaming server to deliver protected content to iOS device too, both in applications created with AIR as well as native ObjC apps.

We have two protection techniques: full blown DRM protection using Adobe Access 4 (again “Flash” is flashed away) or using PHLS (Protected HLS) the HLS version of PHDS (Protected Http Dynamic Streaming).

This last feature is indeed very interesting because offers a stronger protection than the very simple encryption possible with standard HLS without requiring the costs and worries of DRM servers.

More info in the Kevin Towes’ Blog

Categories: AMS, FMS

Get every new post delivered to your Inbox.

Join 125 other followers