Archive

Archive for the ‘Video’ Category

Future of video: 4K, DASH, HEVC

31 January 2014 3 comments

I must admit, I’m feeling very guilty. This is the only new post in more than 1 year. 2013 has been wonderful from a professional point of view and I have had very few moments, if any, to dedicate to the blog. But for 2014 there are too many interesting trends that I can’t neglect anymore and so I want to return speaking about video encoding, streaming and OTT technologies.

Infact, you know that there are three magic “words” that are outlining the future of video: 4K, HEVC and DASH.

So, as a 2014 new year resolution, I’m planning to speak about ideas and optimizations related to the “magic trio”.

4K or not 4K ?

The first trend is rapidly gaining its momentum. “4K” is on every insiders’ lips and the effort of Youtube, Netflix and others to offer quickly 4K content is also opening new opportunities for selling 4K TVs and Monitors.
I’m focusing part of my researches in finding specific optimizations for H.264 encoding of 4K content. Infact I think that apart from marketing buzz, 4K will be served first using the well known H.264.

There are sereval optimizations to explore for 4K: for example custom quantization matrix, bias toward the use of 8×8 transform, changes in psyco visual optimizations, to name a few. 4K also pushes the limit of H.264 for motion compensation and estimation (too long MVs) creating several efficiency problems. But if is useful to optimized an HD and FullHD stream, it is much more crucial to super optimize a 4K stream because the level of bitrate that we are speaking about is difficult to have in Internet or to have consistently.

ABR streaming can help here but not as usual. Who can accept to watch a 2.5Mbit/s 720p rendition on a 80” 4K display because of low bandwidth on peak times ? (it is the same experience as watching a 360p video on a 40” screen from 1.5 mt of distance, try and tell me) Who buy a 4K wants 4K, no compromise. Further more, as Dan Rayburn underlined, there are few economic reasons to offer 4K because 4K delivery costs 3-4 times Full-HD. This is why I think that optimization is now more important than ever.

HEVC

HEVC has been finally ratified. Like in 2003, when H.264 was ratified, now the encoders are very raw and inefficient and a lot of work is to be done, but the potentialities are all there. Theoretically HEVC is said to be from 30 to 50% more efficient than H.264 (higher efficiency at higher resolutions). So it is not a mistery that 4K and H.265 are seen as the winning couple. But the increase in pixel to be processed (8x passing from 1080p25/30 to 2160p50/60) and the complexity of the new codec (approx. 10x during encoding compared to H.264) do not draw a simple scenario with increses in required processing power up to a 80x factor. But hey…we are now like in 2003, we have maybe 10 years ahead to squeeze the max out of H.265, and this is very exciting. In thee while, H.264 still have some room for improvements and for at least a couple years will continue to be the king on the hill.

I have started to play with HEVC and probably the amount of time I’ll dedicate to experiment will increase steadily during 2014. By now I have collected interesting results. The bigger Block Transforms (not only 4×4 and 8×8 like in H.264 but also 16×16 and 32×32) plus some advanced deblocking  and adaptive filtering are able to produce a much “smoother degradation” of quality when decreasing the bitrate, especially for high complexity scenes. On the other hand, the different handling of fine details is producing now less details retantion than H.264 and new approaches to psycovisual optimizations are all to be invented.

And VP9 ? Interesting technology, good potentiality. Will be successful? Hard to tell, until then I will continue to keep it under observation.

DASH

Last but not least there’s the new MPEG standard for ABR streaming MPEG DASH (Dynamic Adaptive Streaming over HTTP). HLS is spreading over various devices but at the same time the implementations are frequently bugged and without control. DASH on the other hand provides plenty of control and it is possible to change heuristic. This is very important to achieve an Higher-as-possible QoE (or QoS), a key factor in the future where CDNs’ cost per GB is flattening while viewers’ number and stream size/quality is increasing .

So stay tuned.

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part VI

19 October 2012 33 comments

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

The fabulous world of FFmpeg  filtering

Transcoding is not a “static” matter, it is dynamic because you may have in input a very wide range of content’s types and you may have to set encoding parameters accordingly (This is particularly true for user generated contents).

Not only, the elaborations that you need to do in a video project may go beyond a simple transcoding and involve a deeper capacity of analysis, handling and “filtering” of video files.

Let’s consider some examples:

1. you have input files of several resolutions and aspect ratios and you have to encode them to two target output formats (one for 16:9 and one for 4:3) . In this case you need to analyze the input file and decide what profile to apply depending by input aspect ratio.

2. now let’s suppose you want also to encode video at the target resolution only if the input has an equal or higher resolution and keep the original otherwise. Again you’d need some external logic to read the metadata of the input and setup a dedicated encoding profile.

3. sometime video needs to be filtered, scaled and filtered again. Like , for istance, deinterlacing,  watermarking and denoising. You need to be able to specify a sequence of  filtering and/or manipulation tasks.

4. everybody needs thumbnails generation, but it’s difficult to find a shot really representative of the video content. Grabbing shots only on scene changes may be far more efficient.

FFmpeg can satisfy these kinds of complex analysis, handling and filtering tasks even without an external logic using the embedded filtering engine ( -vf ). For very complex workflows an external controller is still necessary but filters come handy when you need to do the job straight and simple.

FFmpeg filtering is a wide topic because there are hundreds of filters and thousands of combinations. So, using the same “recipe” style of the previous articles of this series, I’ll try to solve some common problems with specific command line samples focused on filtering. Note that to simplify command lines I’ll omit the parameters dedicated to H.264 and AAc encoding. Take a look at previous articles for such informations.

1. Adaptive Resize

In FFmpeg you can use the -s switch to set the resolution of the output but this is a not flexible solution. Far more control is provided by the filter “scale”.  The following command line scales the input to the desired resolution the same way as -s:

ffmpeg -i input.mp4 -vf  "scale=640:360" output.mp4

But scale provides you also with a way to specifing only the vertical or horizontal resolution and calculate the other to keep the same aspect ratio of the input:

ffmpeg -i input.mp4 -vf  "scale=640:-1" output.mp4

With -1 in the vertical resolution you delegate to FFmpeg the calculation of the right value to keep the same aspect ratio of input (default) or obtain the aspect radio specified with -aspect switch (if present). Unfortunately, depending by input resolution, this may end with a odd value or an even value witch is not divisable by 2 as requested by H.264. To enforce a “divisible by x” rule, you can simply use the emebedded expression evaluation engine:

ffmpeg -i input.mp4 -vf  "scale=640:trunc(ow/a/2)*2" output.mp4

The expression trunc(ow/a/2)*2 as vertical resolution means: use as output height the output width (ow = in this case 640) divided for input aspect ratio and approximated to the nearest multiple of 2 (I’m sure most of you are practiced with this kind of calculation).

2. Conditional resize

Let’s go further and find a solution to the problem 2 mentioned above: how to skip resize if the input resolution is lower than the target ?

ffmpeg -i input.mp4 -vf  "scale=min(640,iw):trunc(ow/a/2)*2" output.mp4

This command line uses as width the minimum between 640 and the input width (iw), and then scales the height to maintain the original aspect ratio. Notice that “,” may require to be escaped to “\,” in some shells.

With this kind of filtering you can easily setup a command line for massive batch transcoding that adapts smartly the output resolution to the target. Way to use the original resolution when lower than the target? Well, if you encode with -crf this may help you save alot of bandwidth!

3. Deinterlace

SD content is always interlaced and FullHD is very often interlaced. If you encode for the web you need to deinterlace and produce a progressive video which is also easier to compress. FFmpeg has a good deinterlacer filter named yadif (yet another deinterlacing filter) which is more efficient than the standard -deinterlace switch.

ffmpeg -i input.mp4 -vf  "yadif=0:-1:0, scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4

This command deinterlace the source (only if it is interlaced) and then scale down to half the horizontal and vertical resolution. In this case the sequence is mandatory: always deinterlace prior to scale!

4. Interlacing aware scaling

Sometimes, especially if you work for ipTV projects, you may need to encode interlaced (this is because legacy STBs require interlaced contents and also because interlaced may have higher temporal resolution). This is simple, just add -tff or -bff (top field first or bottom field first) in the x264 parameters. But there’s a problem: when you start from a 1080i and want to go down to an interlaced SD output (576i or 480i) you need an interlacing aware scaling because a standard scaling will break the interlacing. No fear, recently FFmpeg has introduced this option in the scale filter:

ffmpeg -i input.mp4 -vf  "scale=720:576:-1" output.mp4

The third optional flag of filter is dedicated to interlace scaling. -1 means automatic detection, use 1 instead to force interlacing scaling.

5. Denoising

When seeking for an high compression ratio it is very useful to reduce the video noise of input. There are several possibilities, my preferite is the  hqdn3d filter (high quality de-noising 3d filter)

ffmpeg -i input.mp4 -vf  "yadif,hqdn3d=1.5:1.5:6:6,scale=640:360" output.mp4

The filter can denoise video using a spatial function (first two parameters set strength) and a temporal function (last two parameters). Depending by the type of source (level of motion) can be more useful the spatial or the temporal. Pay attention also to the order of the filters: deinterlace -> denoise ->  scaling is usually the best.

6. Select only specific frames from input

Sometime you need to control which frames are passed to the encoding stage or more simply change the Fps. Here you find some useful usages of the select filter:

ffmpeg -i input.mp4 -vf  "select=eq(pict_type,I)" output.mp4

This sample command filter out every frame that are not an I-frame. This is useful when you know the gop structure of the original and want to create in output a fast preview of the video. Specifing a frame rate for the output with -r accelerate the playback while using -vsync 0 will copy the pts from input and keep the playback real-time.

Note: The previous command is similar to the input switch -skip_frame nokey ( -skip_frame bidir drops b-frames instead during deconding, useful to speedup decoding of big files in special cases).

ffmpeg -i input.mp4 -vf  "select=not(mod(n,3))" output.mp4

This command selects a frame every 3, so it is possible to decimate original framerate by an integer factor N, useful for mobile low-bitrate encoding.

7. Speed-up or slow-down the video

 It is also funny to play with PTS (presentation time stamps)

ffmpeg -i input.mp4 -vf  "setpts=0.5*PTS" output.mp4

Use this to speed-up your video of a factor 2 (frame are dropped accordingly), or this below to slow-down:

ffmpeg -i input.mp4 -vf  "setpts=2.0*PTS" output.mp4

8. Generate thumbnails on scene changes

The filter thumbnail tries to find the most representative frames in the video. Good to generate thumbnails.

ffmpeg -i input.mp4 -vf  "thumbnail,scale=640:360" -frames:v 1 thumb.png

A different way to achieve this is to use again select filter. The following command selects only frames that have more than 40% of changes compared to previous (and so probably are scene changes) and generates a sequence of 5 png.

ffmpeg -i input.mp4 -vf  "select=gt(scene,0.4),scale=640x:360" -frames:v 5 thumb%03d.png

Conclusions

The world of FFmpeg filtering is very wide and this is only a quick and “filtered” view on this world. Let me know in the comments or on twitter (@sonnati) if you need more complex filters or have problems adventuring in this fabulous world ;-)

[Index]

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 

Categories: ffmpeg, Video

Netflix – meditations on a video streaming giant

18 July 2012 7 comments

Netflix, during June, reached the record level of 1 Billion hours streamed in a month. It is an incredibly huge level of bandwidth, an impetuous and growing stream of bits that makes Netflix one of the TOP10 Internet bandwidth “consumer”. But how much does it cost to Netflix this huge stream ?

I remember an article of a couple years ago by Dan Rayburn in which he estimated an average cost of 3c$ per GByte, a low rate usually applied by CDNs to very large clients. In an article of 2011, Dan corrected the estimation discussing a more complex pricing model for such big players (a mix of per GB and per Gbit/s). The new estimation can, however, be approximated to 1.5c$/GB.

This level of pricing may seem very low and negligible in the overall Netflix’s business, but I think that the growing consumption due to the relatively high average of content streamed per user per month may be a problem for Netflix if not brought under control.

Let’s dig deeper in the numbers.

Let’s suppose that the average bitrate streamed to users is 2.4 Mbit/s (see this post in the netflix blog), this means that every hour of content requires in average 1080 MB (1GB).

If you multiply this for 1Billion hours you have 1 Billion GBs * 1.5c$ = 10M$ / month, 120 M$ per year.

Compared to the cost of CDNs of 2011, 2012 is around the double. This is caused by an increase in the number of clients but most of all by an increase in the average amount of data streamed per client. A wopping 90 minutes per day per user. I think that this may be considered near the maximum possible but a further increase to 120 minutes may be realistic in a worst case simulation. This would mean 160M$ per year.

With these premises it is not a surprise that Netflix is searching to control delivery costs creating their own, single purpose, CDN and optimizing encoding.

You know that I’m very sensible to encoding optimization. I have always stated that for this kind of business encoding optimization is of fundamental importance. I have already demostrated in the past that H.264 can be optimized much more then what players like Youtube, Netflix, Hulu, BBC  are doing today. Here I specifically addressed Youtube and Netflix.

Netflix could benefits of a 30% to 50% reduction in average bitrate consumption with a strong optimization of the entire encoding pipeline (plus eventually of the Silverlight player). This could mean savings for 60-80M$ per year and at the same time an improvement in the average quality delivered to client, a key feature in the increasingly competitive market of OTT video.

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part V

2 July 2012 29 comments

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

Introduction

After almost one year from the starting post of this series dedicated to FFmpeg I have found some time to catch-up with this topic and revise/refresh the series. In this year a lot of things happened on the FFmpeg side (and not only), so I have corrected a lot of small errors and changes in syntax of commonly used commands. So this is also a good opportunity for you to refresh your knowledge about FFmpeg and the current state-of-the-art. Above you find the Index of the articles.

The most important changes are around parameters like -vcodec, -b, -ab, -vframes, etc… to avoid misunderstandings now a stream_identifier has been added to specify if the parameter is related to the audio or video track. In case of multiple AV tracks there also an optiona parameter to specify the track number. Take a look at the updates of PART II to have more informations about new syntax and obsolete parameters.

Another important change is realated to libfaac library which is now external. Read point 2 below to know about alternatives.

Last but not least, FFmpeg introduced the possibility to control directly the parameters of x264lib using the -x264opts command. Not for everyone but very useful when you want the control and performance of x264 and all the input and output options of FFmpeg.

Fifth Part – Advanced Usage

This fifth article wants to add more advanced use cases and usages to what was presented and discussed in the previous 4 parts. This article will be enriched in the next weeks and months to include even more advanced examples and use cases that can be solved with a smart use of FFmpeg. Good reading!

1. Optimize multi-pass multi-bitrates encoding

You know that encoding for dynamic streaming techniques (HDS, HLS, Silverlight) requires the renditions to have aligned keyframes and be CBR or capped VBR.
A neat trick to avoid the limit of fixed length GOPs is to assure a consistent alignment of keyframes across renditions reusing the same first pass statfile across renditions.

ffmpeg –i IN –pass 1 –an –vcodec libx264 –r 30 –b 1500k –bufsize 1500k –keyint_min 60 –g 120 –s 1280×720 –vpre slower_fastfirstpass OUT_1500.mp4

This command line is the first pass of the first rendition. The first pass generates a stat file for the second pass.

ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 1500k –bufsize 1500k –keyint_min 60 –g 120 –s 1280×720 –vpre slower OUT_1500.mp4

Instead of recreating a first pass stat file for the next renditions, you can use the previous simply launching the second passes of the next renditions

ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 1000k –bufsize 1000k –keyint_min 60 –g 120 –s 854×480 –vpre slower O_1000.mp4
ffmpeg –i IN –pass 2 –an –vcodec libx264 –r 30 –b 500k –bufsize 500k –keyint_min 60 –g 120 –s 640×360 –vpre slower O_500.mp4

Since the second pass is less accurate if it use a stat file generated with a too much different resolution and bitrate, may be better to use a rendition in the middle to generate the first pass and not the highest rendition.

2. AAC encoding

libfaac has been extracted from ffmpeg and is now an external library. There are two alternatives yet embedded inside ffmpeg: libvo_aacenc or the standard aac library.

ffmpeg input.mp3 -c:a libvo_aacenc -b:a 96k -ac 2 -ar 44100 output.aac

ffmpeg input.mp3 -c:a aac -strict experimental -b:a 96k -ac 2 -ar 44100 output.aac

I have tested both and it seems to me that libvo is the best alternative. It produces a sufficiently good AAC LC.
In a future article I’ll explore some alternative like encoding audio track externally and remux then with ffmpeg or mp4box.
This is a must go if you need the higher efficiency of HE AAC or HE AAC v2.

3. Joining video

Joining video is strangely a complex task with FFmpeg. A reader suggested this solution (via Steven’s Blog):

ffmpeg -ss 100 -t 10 -i in.mp4 -c copy -bsf h264_mp4toannexb 100.h264
ffmpeg -ss 200 -t 10 -i in.mp4 -c copy -bsf h264_mp4toannexb 200.h264
ffmpeg -i concat:”100.h264|200.h264″ -i in.mp3 -c copy out.mp4
The first two lines generate two h.264 elementary streams. The h264_mp4toannexb option is mandatory to be able to concatenate efficiently elementary streams at binary level.
The third line use the concat option to cancatenate the ES segments to form a new input.
I usually use mp4box for this kind of purpose and not FFmpeg.

4. Use an HLS stream as source
 

FFmpeg now supports also Apple HTTP Live Streaming as an input protocol. So it is really simple to acquire or repurpose an HLS streaming, simply specify the path to .m3u8 manifest.

ES: Do you want to stream an existing .m3u8 stream to Flash on the desktop using FMS (now AMS) ? Try this:
ffmpeg -re -i http://server/path/stream.m3u8 -c copy -f flv "rtmp://FMSserver/app/streamName live=1"

5. Record a stream endlessly rotating target file
 

Segmenting feature of FFmpeg can also be useful to create an endless recorder with rotating buffer. It can be done using the segment_wrap parameter that wraps around segment index once it reached a limit.

ffmpeg -i rtmp://INPUT -codec copy -f segment -segment_list out.list -segment_time 3600 -segment_wrap 24 out%03d.mp4
The previous commandline records endlessly the INPUT stream in a ring buffer formed by 24 chunk of 1hr video.Conclusionfollow me on twitter to know more about FFmpeg and video related topics (@sonnati).

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 

Categories: Video

Market repositioning of Flash begins (updated)

1 March 2012 3 comments

I have already talked (perhaps too much) about the Future of Flash in this post. There I didn’t hide my perplexities about the Market position of Flash compared to alternative technologies. After the drop of Flash Player for Mobile there was a strong decline in confidence for Flash platform. But now the scenario is beginning to emerge sharply and I begin to understand the purpose of the Adobe strategy.

Yesterday Adobe has released a public beta of AIR 3.2 for mobile application development. This version implement the promised support for Stage3D in mobile platforms like iOS and Android. A number of demo video appeared on the web showing excellent 3D performance and a lot of renewed interest about mobile game development using  AIR:


Square Enix’s [Barts] running on Android

The time will tell, but AIR has the potentialities to become a leader platform in 2D/3D games development. A single code base is sufficient to create a game for Desktop (AIR’s captivate runtime), Browser (someone named Facebook ?) and now iOS and Android. With ConnectedTVs and STBs support to come (already showed during MAX), the dream of the Open Screen project is becoming reality, at least in the game dev area (but also intensive graphic/media applications may leverage 2D/3D accelerations).

Therefore Adobe has concentrated the resources in a promising field where Flash could easily become leader. In 2D/3D browser gaming it is just leader (500Million players on Facebook may be sufficient as business card ?) . Try by yourself searching for Stage3D demo in YouTube to see the huge amount of interest for this technology from any game developers (big and small).

The second strong commitment of the platform is for video delivery where Flash has been leader in the past 5 years and is still today. The performance of video decoding in the browser has been widely improved with a completely redesigned pipeline that now exploits mult-threading heavily. But most important, the support for accelerated H.264 streaming has been added to AIR for iOS using the standard Apple HLS (already supported by FMS 4.5 and Wowza Server).

During the spring Adobe will release the new version of Flash Access (now Adobe Access 4) that will include content protection for iOS devices (both in AIR and native application) in the form of DRM on HLS. This move has the potentiality to make Adobe re-gain the favor of majors and big content providers that would have the possibility to uniform DRMs across Android, iOS, Desktop Apps, Browsers, Google TV and some STBs.

The support for HW accelerated 2D, 3D and video playback on mobile, plus an improvement in performance for Flex applications, plus the possibility to integrate HTML5 contents with StageWebView, plus the DRM, plus native extentions, **finally**, makes AIR (for Mobile) an interesting, efficient, effective and valuable solution for cross platform application development.

(Updated 03 March 2012)

I think the platform is 99% complete now, which is very good, but I would like to see the following issues addressed ASAP to complete the features list of AIR for Mobile:

  • H.264/AAC on RTMP : necessary for effecient real time video application, especially now that FP supports H.264 encoding.
  • Echo cancellation : see the previous point.
  • Effective and Robust support for key native features like InApp Purchase and Notification. I like Native Extentions’s idea but I’d prefer an official API for critical features like these.
  • Better integration/communication between AS3 and JS in StageWebView. No more hacks please.

Make a comment if you think that there’s something else of important to add to AIR for Mobile/AIR for iOS.

Categories: Flash, Mobile, Video

What about the future of Flash ?

16 January 2012 8 comments

Long time passed since my last post on this blog. I have been very busy in an important video streaming project but this is not the only reason for my absence. I have also wanted to wait and take all the necessary time to analyze, ponder and “digest” the infamous Flash affair.
I will not hide my bitterness about the fact, but I’m also more optimitic now, after I have seen the real consequences and have had the time to elaborate on the future scenario. I’ts not all a bed of roses but I’m somewhat optimistic.

First of all, fortunately, I’m not limited to Flash technology in my consultancies. I work with .net technologies for many years and I have designed and deployed successful streaming services in HLS with both Wowza Server and FMS 4.5

You also know that I’m an encoding expert with important success cases and a deep knowledge of commercial and open source encoders like Ffmpeg, x264, Flip Factory, Telestream Vantage, Atheme KFE, Rozhet CarbonCoder, Digital Rapids to name a few.

I have created encoding pipelines and optimized existing ones for delivery platforms based on HLS, Flash HDS, MS Silverlight and ipTV and designed decoding and delivery optimizations for Flash and Silverlight.

So when I talk about my bitterness, it is not driven by the fear for the future but by the awareness of the big mistake that Adobe has done stabbing Flash in the back. I want to focus this post on the future prospectives for Flash and not on the disastrous announcement of Adobe (a masterpiece of masochism, at least from a PR point of view), however a brief summary of my thoughts on the topic is a good thing. I do two short considerations:

1. Adobe may also have had good, long term stategic reasons for dropping Flash for mobile browser, but they could choose modes and terms with much less collateral damages. Why not reduce progressively the commitments and the investments across the lifespan of FP11 to avoid harming the Flash Community ? After all, FP11 has been released for Android and QNX and it has brought important improvements in performance and stability. I know that Flash for mobile browsing has a lot of problems and those problems are due tot the excessive use of bad Flash coding that has been done over  time especially for advertising. Obviously if you have a page with 5-6 Flash banners that can kill an old desktop computer, how can be able a tablet to handle this ?
A simple solution could be to put every swf  of a page in an idle mode, with a clickable poster image that activates the swf  only when touched. Simple, clear and always better than have no Flash support in mobile browsing.

2. Adobe just does not realize that is killing the goose that lays golden eggs. Have you even thought about the fact that Flash is used every day by 2 billion people! It’s probably the most pervasive peace of sofware after MS Windows. Giants like Steve Jobs would have exploited such competitive advantage in ways that the current Adobe management are not even able to imagine. Yet it is not difficult to imagine for example a marketplace of Flash and AIR apps on the model of the MacOS AppStore (but with 20 times more potential customers). What it is worth this kind of power ? Evidently near t0 zero for Adobe.

But now the damage is done and it worth nothing to complain, and so there will be some short, medium and long term consequences. The short term consequences are paradoxically positive for experienced Flash developers. This is because new developers, creative shops and consultancy firms are focusing interest to HTML5 because of the bad medium and long term outlook for the Flash technology and because of  marketing reasons. But the demand for Flash technology is not decrasing as fast as the offer and so there is a burst in the amount of work available for skilled developers.

In a medium term I see an higher convergence between the demand and offer for Flash-based projects in general. Flash will mainain or increase it’s penetration in web gaming thanks to 3D (remember that the casual game market on Internet is completely Flash-centric today, how forget that every day 200+ million people play some Flash games in Facebook ?) and probably will remain the reference for video streaming, but in the RIA market and creative market HTML5 will definitely gain it’s momentum (in real terms, not like now where only a few important creative, video or gaming projects has migrated from Flash to HTML5).

Flash in the mobile market, as a cross platform mobile development technology, has not, in my opinion a clear outlook for the future.  The sudden drop of Flash for mobile browser and the drastic reduction of commitment for Flex has been percepited as a treachery of Adobe from the point of view of the loyal base of sustainers and developers and as a definitive change in the wind from the point of view of customers and stake holders. How to blame them ? the lack of support from its own creator is a mortal stub for a technology and the message from Adobe is clear: in the long term we’ll substitute Flash with HTML5. Not only, we will focus more on tools than technologies (Flex docet).

No place for developers in the future of Adobe ? I don’t know but the long term perspective of Flash, Flex and other Flash related technology (FMS?) has been heavely perturbated by the infamous move. Flex is now an Apache baked project but is it a guarantee of evolution and support ? Who will invest time and credibility among customers in a technology for mobile development that has not a clear commitment from its creator and controller ?

Concluding, what I intends to do as a Flash developer ? In the short term I have to do a lot of Flash related projects, so no problem. In the medium term I think to continue using Flash/AIR for Mobile development. This is a clear path for me, I can capitalize on my AS3,Flash and Flex platform skills to develop desktop, browser and mobile apps. Now the level of features for Android and iOS has become good enough to be able to develop any kind of apps without the need for adding Java and Objective C to your skill portfolio (in my opinion, the recent support for notifications, in app purchase and HLS have cleared the top three entries of the most wanted and needed features list).

And in the long-term ? I dont’ have an answer, I think I’ll simply wait and see.

PS: Very interesting article about “migrating” from Flex to JS (Thanks to Anna Karim) – https://plus.google.com/109047477151984864676/posts/CVGJKLMMehs

Categories: Flash, Mobile, Video

My presentation at MAX2011 is available on Adobe TV

10 October 2011 5 comments

Finally the recording of my presentation at MAX2011 (Encoding for performance on multiple devices) is available on Adobe TV.

You can also download the pdf version here. My using of FFmpeg for repurposing the streams of FMS has attracted quite a lot of interest and attention. I’m planning to extend the series of article dedicated to FFmpeg and also to transform it in a permanent knowledge-base on FFmpeg and related best-practices.

Categories: Flash, FMS, Video

Bandwidth is running out. Let’s save the bandwidth

15 September 2011 19 comments

The global bandwidth consumption is growing every day and one of the main causes is the explosion of bandwidth hungry Internet services based on Video On Demand (VOD)  or live streaming. Youtube is accounted for a considerable portion of the whole Internet bandwidth usage, but also Hulu and NetFlix are first class consumers.

One of the cause of this abnormal consumption (apart from the high popularity) is the low level of optimization used in video encoding: for example Youtube encodes 480p video @1Mbit/s, 720p @2.1Mbit/s and 1080p @3.5Mbit/s which are rather high values. But also NetFlix, BBC and Hulu use conservatives settings. You may observe that NetFlix and Hulu use adaptive streaming to offer different quality levels depending by network conditions but such techniques are aimed at improving the QoS and not reduce the bandwidth consumption, so it’s very important to offer a quality/bitrate ratio as high as possible and not underestimate all the consequences of an un-optimized encoding.

The main consequence of a not optimized video is an high overall bandwidth consumption and therefore an high CDN bill. But for those giants this is not always a problem because, thanks to very high volumes, they can negotiate a very low cost per GByte.

However it is not only a matter of pure bandwith cost. There are many other hidden “costs”. For example at peek hours may be difficult to stream HD videos from Youtube without frequent, and annoying, rebuffering. Furthermore a lot of users nowadays use mobile connections for their laptop/tablet and rarely such connections offer more than 1-2 Mbit/s of real average bandwidth. If the video streaming service, differently from YouTube, uses dynamic streaming (like Hulu, NetFlix, EpicHD, etc…) the user is still able to see the video without re-buffering but it is very likely that he will obtain one of the lower quality versions of the stream in this bandwidth constrained scenarios and not the high quality one.

Infact, the use of dynamic streaming is today very often used as an alibi for poorly optimized encoding workflows…

This state of insufficient bandwidth is more frequent in less developed countries. But even highly developed countries can have problems if we think at the recent data trasfer restrictions introduced in Canada or in USA  by some network providers (AT&T – 150GB/month, for example).

These limits are established especially because at peak hours the strong video streaming consumption can saturate the infrastructure, even of an entire nation as was happening in 2008-2009 in UK after the launch and the consequent extraordinary success of BBC’s iPlayer.

So dynamic streaming can help but must not to be used as an excuse to poorly optimized encodings and it is absurd to advertise a streaming service as HD when it requires to have 3-4Mbit/s+ of average bandwidth to stream the higher quality bitrate while in USA the average is around 2.9Mbit/s (meaning that more than 50% of users will stream a lower quality stream and not the HD one).

How many customers are really able to see an HD stream from start to end in a real scenario with this kind of bitrates ?

The solution is : invest in video optimization

Fortunately today every first class video provider uses H.264 for their video and H.264 offers still much room for improvement.
In the past I have shown several examples of optimized encodings. They were often experiments to explore the limit of H.264, or the possibilities of further quality improvements that the Flash Player can provide to a video streaming service (take a look at my “best articles” area).

In such experiments I have usually tried to encode a 720p video at a very low bitrate like 500Kbit/s. 500Kbit/s is more than a psycological threshold because at this level of bitrate it is really-really complex to achieve a satisfactory level of quality in 720p. Therefore my first experiments used to be accomplished on not too much complex contents.

But in these last 3 years I have improved considerably my skills and the knowledge of the inner principles of H.264. I have worked for first class media companies and contributed to the creation of advanced video platforms capable to offer excellent video quality for desktop (Flash, Silverlight, Widevine), mobile (Flash, HLS, native) and STB (vbr or cbr .ts).

So now I’m able to show you some examples of complex content encoded with very good quality/bitrate ratios in a real world scenario.

I’m not afraid

To show you this new level of optimization of H.264 I have choosen one of the most watched video in Youtube: Not afraid by Eminem.
This is a complex clip with a lot of movements, dark scenes, some transparencies, lens flares and a lot of fine details on artist’s face.

Youtube offers the video in these 4 versions (plus a 240p):

1080p @ 3.5Mbit/s
720p @ 2.1Mbit/s
480p @ 1Mbit/s
360p @ 0.5Mbit/s

Starting from this “state of art”, I have tryed to show what can be obtained with a little bit of optimization.
Why not try to offer the quality of the first three stream options but at half the bitrate ? Let’s say:

1080p @ 1.7Mbit/s
720p @ 1Mbit/s
576p @ 0.5Mbit/s

A such replacement would lead to two consequences:

A. Total bandwidth consumption reduced approximately by  2.
B. Much more users would be able to watch high quality video, even in low speed scenarios (mobile, capped connections, peak hours and developing countries).

But first of all, let’s take a look at the final result. Here you find a comparison page. On the left you have the YouTube video, on the right the optimized set of encodings. It is not simple to compare two 1080p or 720p videos (follow the instructions in the comparison page), so I have extracted some screenshots to compare the original Youtube version with the optimized encoding.

1. Youtube 1080p @ 3.5Mbit/s vs Optimized 1080p @ 1.7Mbit/s

Notice the skin details and imperfections. The optimized encoding offers virtually the same quality at half the bitrate. Consequently, the quality of 1080p at 15% less bitrate than the 720p version of Youtube.

2. Youtube 720p @ 2Mbit/s vs optimized 720p @ 1Mbit/s

Again virtually the same quality at half the bitrate. Consequently 720p video can be offered instead of 480p which has the same bitrate:

3. Youtube 480p @ 1Mbit/s vs 720p @ 1Mbit/s

Optimized 720p offers higher quality (details, grain, spatial resolution) at the same bitrate.

4. Youtube 480p @ 1Mbit/s vs optimized 576p@ 500Kbit/s

Instead of using a 854×480 @ 500Kbit/s resolution I have preferred to use a 1024×576 (576p). I have also tryed to encode in 720p @ 600-700Kbit/s with very good results but I liked the factor 2 reduction in bitrate, so in the end, I opted for 576p which offered more stable results across the whole video. In this case the quality, details level and spatial resolution is higher than the original but at half the bitrate.

5. Youtube 360p @ 500Kbit/s vs optimized 576 @ 500Kbit/s

Again much higher spatial resolution, details level and overal quality at the same bitrate.

For the sake of optimization

How have I obtained a bitrate / quality ratio like this ? Well, it is not simple but I will try to explain the base principle.

Modern encoders do a lot of work to optimize the encoding from a mathematical / machine point of view. So for example a metric is used for Rate Distortion Optimization (like PSNR or SSIM). But this kind of approach is not always usefull at low bitrates, or when a high quality/bitrate ratio is required. In this scenario the standard approach may not lead to the best encoding because it is not capable to forecast what pictures are more important to enhance the quality perceived by the average user. Not every keyframes or portions of video are equally important.

These examples of optimized encodings are obtained with a mix of automated video analysis tool (for dynamic filtering, for istance) and human-guided fitting approach (for keyframe placement and quality burst). I’m actually developing a fully automated pipeline but by now, if an expert eye guides the process, it produces better results.

Unfortuntely there is a downside in using an ultra-optimized encoding: the encoding time rises consistently, so it is not realistic to think that Youtube could re-encode every single video with new optimized profiles.

But, you know, when we talk about big numbers, there’s an empiric law which may help use in a real world scenario: the Pareto principle. Let’s apply the Pareto principle to Youtube…

The Pareto principle

The Pareto principle (aka the 80-20 law) states that, for many events, roughly 80% of the effects come from 20% of the causes. Applying this rule to YouTube, it’s very likely that 80% of traffic comes from 20% of videos. A derivation of Pareto law known as 64-4 rule states that 64% of effects come from 4% of causes (and so on). So optimizing a reduced set of most popular videos would lead to huge savings and optimal user experience with only a limited amount of extra effort (the 4%).

But “Not Afraid” belongs to the top 10 of most popular video on YouTube, so it’s a perfect candidate for an extremization of Pareto law.

Let’s do some calculation. My samples reduce the bandwith of a factor 2 at every versions. So if we suppose that the most preferite version of the video is 720p and consider that the video has been watched more than 250 M times in the last 12 months, YouTube has consumed : 64MB * 250 M views = 16 PBytes, only to stream Not Afraid for 1 year.

Supposing an “equivalent” cost of 2c$/GByte*, this means 320.000$ (* it’s the lowest cost in CDN industry for huge volumes; probably YouTube uses different models of billing so consider it as a rough evaluation).

So an hand-made encoding of only 1 video could generate a saving of 160.000$. Wow… Encoding even only the TOP10 Youtube videos means probably at least 1M$ of saving…multiply this for the TOP1000 video and probably we talk of tens of millions per year…what to say…youtube, you know where to find me ;-)

Moral of the story

The proposed application of Pareto rule is an example of adaptive strategy. Instead of encode all the video with a complex process that could not be affordable, why not encode only a limited subset of very popular videos ? Why not encode them with the standard set and then re-process without hurry only if the rate of popularity rise over an interesting  threshold ?

Adaptive strategies are always the most productive. So if you apply this to the Youtube model, you get huge bandwidth (money) savings, if you apply this to a NetFlix model (dynamic streaming) you get a sudden increase in average quality delivered to clients and so on.

Concluding, the moral of the story is that every investment in encoding optimizations and adaptive encoding workflows can have very positive effects on user experience and/or business balance.

PS: I’ll speak about encoding and adaptive strategies during Adobe MAX 2011 (2-5 October) – If you are there and interested in encoding join my presentation : http://bit.ly/qvKjP0

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part IV

30 August 2011 60 comments

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

Fourth Part

In this article I will focus on the support for RTMP that makes FFmpeg an excellent tool for enhancing the capabilities of the Adobe Flash Streaming Ecosystem.

FFmpeg introduced a strong support for RTMP streaming with the release 0.5 by the inclusion of the librtmp (rtmpdump) core. An RTMP stream can be used both as an input and/or as an output in a command line.

The required syntax is:

rtmp_proto://server[:port][/application][/stream] options

where rtmp_proto can be: “rtmp“, “rtmpt“, “rtmpte“, “rtmps“, “rtmpte“, “rtmpts” and options contain a list of space-separated options in the form key=val (more info here).

Using some of the parameters that we have seen in the first three parts of the series, it’s possible to do a lot of things that the standard Flash Streaming Ecosystem cannot offer. Sometimes there are minor bugs but generally speaking the rtmplib works well and helps FMS to fill the gap with some advanced feature of Wowza Server (like re-purposing of rtp/rtsp stream, TS-stream and so on). FFmpeg works with FMS as well as Wowza Server and RED5, so in the article I will use FMS as a generic term to mean any “RTMP-server”.

1. STREAM A FILE TO FMS AS IF IT WERE LIVE

With the help of FFmpeg it is possible for example to stream a pre-encoded file to FMS as if it were a live source. This can be very useful for test purpose but also to create pseudo-live channels.

 ffmpeg -re -i localFile.mp4 -c copy -f flv rtmp://server/live/streamName 


The -re option tells FFmpeg to read the input file in realtime and not in the standard as-fast-as-possible manner. With -c copy (alias -acodec copy -vcodec copy ) I’m telling FFmpeg to copy the essences of the input file without transcoding, then to package them in an FLV container (-f flv) and send the final bitstream to an rtmp destination (rtmp://server/live/streamName).

The input file must have audio and video codec compatible with FMS, for example H.264 for video and AAC for audio but any supported codecs combination should work.
Obviously it would be also possible to encode on the fly the input video. In this case remember that the CPU power requested for a live encoding can be high and cause loss in frame rate or stuttering playback on subscribers’ side.

In which scenario can be useful a command like that ?

For example, suppose to have created a communication or conference tool in AIR. One of the partecipants at the conference could fetch a local file and stream it to the conference FMS to show, in realtime, the same file to other partecipants. Leveraging the “native process” feature of AIR it is simple to launch a command line like the one above and do the job. In this scenario, probably you will have to transcode the input, or check for the compatibility of codecs analyzing the input up front (remember ffmpeg -i INPUT trick we spoke about in the second article).

2. GRAB AN RTMP SOURCE

Using a command like this:

 ffmpeg -i rtmp://server/live/streamName -c copy dump.flv 

It’s possible to dump locally the content of a remote RTMP stream. This can be useful for test/audit/validation purpose. It works for both live and on-demand content.

3. TRANSCODE LIVE RTMP TO LIVE RTMP

One of the more interesting scenario is when you want to convert a format to a different one for compatibility sake or to change the characteristics of the original stream.

Let’s suppose to have a Flash Player based app that do a live broadcast. You know that until FP11, Flash can only encode using the old Sorenson spark for video and NellyMoser ASAO or Speex for audio. You may use a live transcoding command to enhance the compression of the video transcoding from Sorenson to H.264:

 ffmpeg -i rtmp://server/live/originalStream -c:a copy -c:v libx264 -vpre slow -f flv rtmp://server/live/h264Stream 

This could be useful to reduce bandwidth usage especially in live broadcasting where latency it’s not a problem.
The next release of FMS will also offer support for the Apple HTTP Live Streaming (like Wowza already do). So it will be possible to use FMS to stream live to iOS device. But FMS does not transcode the stream essence, it performs only a repackaging or repurposing of the original essences. But FFmpeg can help us to convert the uncompliant Sorenson-Speex stream to a H.264-AAC stream in this way:

 ffmpeg -i rtmp://server/live/originalStream -c:a libfaac -ar 44100 -ab 48k -c:v libx264 -vpre slow -vpre baseline -f flv rtmp://server/live/h264Stream 

(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

See also the point 4 and 5 to know how to generate a multibitrate stream to be compliant with Apple requirements for HLS. This approach will be useful also with FP11 that encode in H.264, but generate only one stream.

Another common scenario is when you are using FMLE to make a live broadcast. The standard windows version of FMLE supports only MP3 and not AAC for audio encoding (plug-in required). This may be a problem when you want to use your stream also to reach iOS devices with FMS or Wowza (iOS requires AAC for HLS streams). Again FFmpeg can help us:

 ffmpeg -i rtmp://server/live/originalStream -acodec libfaac -ar 44100 -ab 48k -vcodec copy -f flv rtmp://server/live/h264_AAC_Stream 

On the other hand, I have had the opposite problem recently with an AIR 2.7+ apps for iOS. AIR for iOS does not support by now H.264 or AAC streaming with the classical netStream object, but I needed to subscribe AAC streams generated for the desktops. FFmpeg helped me in transcoding AAC streams to MP3 for the AIR on iOS app.

Again, you probably know that Apple HLS requires an audio only AAC stream with a bitrate less than 64Kbit/s for the compliance of video streaming apps, but at the same time you probably want to offer an higher audio quality for your live streaming (on desktop fpo istance). Unfortunately FMLE encode at multiple bitrates only the video track while use a unique audio preset for all bitrates. With FFmpeg is possible to generate a dedicated audio only stream in AAC with bitrate less than 64Kbit/s.

4. GENERATE BASELINE FOR LOW-END DEVICES

Very similarly, if you want to be compliant with older iOS versions or other mobile devices (older BB for istance) you need to encode in Baseline profile, but at the same time you may want to leverage high profile for desktop HDS. So you could use FMLE to generate high profile streams, with high quality AAC and then generate server side a baseline set of multi-bitrate streams for HLS and/or low end devices compatibility.

This command read from FMS the highest quality of a multi bitrate set generated by FMLE and starting from that generate 3 scaled down versions in baseline profile for HLS or Mobile. The last stream is an audio only AAC bitstream at 48Kbit/s.

 ffmpeg -re -i rtmp://server/live/high_FMLE_stream -acodec copy -vcodec x264lib -s 640x360 -b 500k -vpre medium -vpre baseline rtmp://server/live/baseline_500k -acodec copy -vcodec x264lib -s 480x272 -b 300k -vpre medium -vpre baseline rtmp://server/live/baseline_300k -acodec copy -vcodec x264lib -s 320x200 -b 150k -vpre medium -vpre baseline rtmp://server/live/baseline_150k -acodec libfaac -vn -ab 48k rtmp://server/live/audio_only_AAC_48k 

UPDATE: using the -x264opts parameter you may rewrite the command like this:

 ffmpeg -re -i rtmp://server/live/high_FMLE_stream -c:a copy -c:v x264lib -s 640x360 -x264opts bitrate=500:profile=baseline:preset=slow rtmp://server/live/baseline_500k -c:a copy -c:v x264lib -s 480x272 -x264opts bitrate=300:profile=baseline:preset=slow rtmp://server/live/baseline_300k -c:a copy -c:v x264lib -s 320x200 -x264opts bitrate=150:profile=baseline:preset=slow rtmp://server/live/baseline_150k -c:a libfaac -vn -b:a 48k rtmp://server/live/audio_only_AAC_48k 



(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

5. ENCODE LIVE FROM LOCAL GRABBING DEVICES

FFmpeg can use also a local AV source, so it’s possible to encode live directly from FFmpeg and bypass completely FMLE. I suggest to do that only in very controlled scenarios because FMLE offers precious, addictional functions like auto-encoding adjust to keep as low as possible the latency when the bandwidth between the acquisition point and the server is not perfect.

This is an example of single bitrate:

 ffmpeg -r 25 -f dshow -s 640x480 -i video="video source name":audio="audio source name" -vcodec libx264 -b 600k -vpre slow -acodec libfaac -ab 128k rtmp://server/application/stream_name 

Join this command line and the previous and you have a multi-bitrate live encoding configuration for desktop and mobile.

6. ENCODE SINGLE PICTURES WITH H.264 INTRA COMPRESSION

H.264 has a very efficient Intra compression mode, so it is possible to leverage it for picture compression. I have estimated an improvement of around 50% in compression compared to JPG. Last year I have discussed estensively the possibility to use this kind of image compression to protect professional footage with FMS and RTMPE. Here you find the article, and this is the command line:

 ffmpeg.exe -i INPUT.jpg -an -vcodec libx264 -coder 1 -flags +loop -cmp +chroma -subq 10 -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -flags2 +dct8x8 -trellis 2 -partitions +parti8x8+parti4x4 -crf 24 -threads 0 -r 25 -g 25 -y OUTPUT.mp4 

Change -crf to modulate encoding quality (and compression rate).

UPDATES

Sometimes when connecting to FMS you may receive some cryptic error. It may help to enclose the destination RTMP address in double quotes and add the option live=1. ES:

 ffmpeg -i rtmp://server/live/originalStream -c:a copy -c:v libx264 -vpre slow -f flv "rtmp://server/live/h264Stream live=1" 

Other info on RTMP dump libray: http://ffmpeg.org/ffmpeg.html#toc-rtmp

CONCLUSIONS

There are a lot of other scenarios where using FFmpeg with FMS (or Wowza) can help you creating new exciting services for you projects and overcome the limitations of the current Flash Video Ecosystem, so now it’s up to you. Try to mix my examples and post comments about new ways of customization that you have found of your RTMP delivery system.
Remember also to follow the discussion on my twitter account (@sonnati).

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

Categories: Video

FFmpeg – the swiss army knife of Internet Streaming – part III

19 August 2011 9 comments

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 


Third part

In this third part we will look more closely at the parameters you need to know to encode to H.264.

FFmpeg uses x264 library to encode to H.264. x264 offers a very wide set of parameters and therefore an accurate control over compression. However you have to know that FFmpeg applies a parameter name re-mapping and doesn’t offer the whole set of x264 options.

UPDATE: FFmpeg allows to specify directly the parameters to the underling x264 lib using the option -x264opt. -x264opt accept parameters as key=value pairs separated by “:”. ES: -x264opt bitrate=1000:profile=baseline:level=4.1…etc.

Explain the meaning of all the parameters is a long task and it is not the aim of this article. So I’ll describe only the most important and provide some useful samples. Therefore, if you want to go deeper in the parameterization of FFmpeg, I can suggest you to read this article to know the meaning of each x264 parameters and the mapping between FFmpeg and x264. To know more about the technical principles of H.264 encoding, I suggest also to take a look at the first part of my presentions at MAX2008, MAX2009 and MAX2010.

ENCODING IN H.264 WITH FFMPEG

Let’s start analyzing a sample command line to encode in H.264 :

ffmpeg -i INPUT -r 25 -b 1000k –s 640×360 -c:v libx264 -flags +loop -me_method hex -g 250 -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -bf 3 -b_strategy 1 -i_qfactor 0.71 -cmp +chroma -subq 8 -me_range 16 -coder 1 -sc_threshold 40 -flags2 +bpyramid+wpred+mixed_refs+dct8x8+fastpskip -keyint_min 25 -refs 3 -trellis 1 –level 30 -directpred 1 -partitions -parti8x8-parti4x4-partp8x8-partp4x4-partb8x8 -threads 0 -acodec libfaac -ar 44100 -ab 96k -y OUTPUT.mp4

(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

This command line encodes the INPUT file using a framerate of 25 Fps (-r), a target bitrate of 1000Kbit/s (-b), a gop max-size of 250 frames (-g), 3 b-frames (-bf) and resizing the input to 640×360 (-s). The level is set to 3.0 (-level), the entropy coder to CABAC (-coder 1) and the number of reference frames to 3 (-refs). The profile is determined by the presence of b-frames, dct8x8 and Cabac, so it is an high-profile. Notice the syntax to enable/disable options in the multi options parameters like -partitions, -flags2 and -cmp. The string -flags2 +bpyramid+wpred+mixed_refs+dct8x8″ means that you are enabling b-pyramid, weighted prediction, mixed references frames and the use of the 8×8 dct. So for example, if you want to disable dct8x8 to generate an output compliant with the main-profile, you can do that changing the previous string to -flags2 +bpyramid+wpred+mixed_refs-dct8x8″ (notice the “-” character in front of dct8x8 instead of “+”). Disabling dct8x8 you obtain a main profile, disabling also b-frames and CABAC (setting “-bf 0” and  “-codec 0“) you obtain a baseline-profile.

Profiles and Levels are very important for device compatibility so it is important to know how to produce a specific profile and level pair. You find a short primer to profiles and levels here and generic raccomandations for multi device encoding here.

MAIN PARAMETERS

Here you find a short explanation of the most significative parameters.

-me_method

Sets the accuracy of the search method in motion estimation. Allowed values: dia (fastest), hex, umh, full (slowest). Dia is usually used for first pass encoding only and full is too slow and not significantly better than umh. For single pass encoding or the second pass in multi-pass encoding use umh or hex depending by encoding speed requirements or constraints.

-subq

Sets the accuracy of motion vectors. Accepts values in the range 1-10. Use lower values like 1-3 for first pass and higher values like 7-10 for the second pass. Again, the effective value depends by a quality/speed tradeoff.

-g, -keyint_min, -sc_threshold

x264 uses by default a dynamic gop size. -g selects the max gop size, -keyint_min the min size. –sc_threshold is the Scene Change sensitivity (0-100). At every scene change a new i-frame (intra compressed frame) is inserted. Depending by -g and -keyint_min an I-frame (IDR frame alias keyframe) is inserted instead. The gop can be long (i.e. -g 300) for compression efficiency sake, or short (i.e. 25/50) for accessibility sake. This depends by what you need to achieve and by the delivery technique used (when using RTMP streaming you can seek to every frame, with progressive downloading only to IDRs). Sometimes you may need to have a consistent, contant gop size across multiple bitrates (i.e. for Http Dynamic Streaming or HLS). To do that set min and max gop size equal and disable completely scene change (i.e. -g 100 -keyint_min 100 -sc-threashold 0).

-bf, b-strategy

-bf sets the max number of consecutive b-frames (H.264 supports up to 16 b-frames). Remember that b-frames are not allowed in baseline profile. B-strategy defines the technique used for b-frames placement.

Use 0 to disable dynamic placement.
Use 1 to enable a fast-choice technique for dynamic placement. Fast but less accurate.
Use 2 to enable a slow-and-accurate mode. Can be really slow if used with an high number of b-frames.

-refs

sets the number of reference frames (H.264 supports up to 16 reference frames). Influences the encoding time. Using more than  4-5 refs gives commonly very little or null gain.

 -partitions

H.264 supports several partitions modes for MBs estimation and compensation. P-macroblocks can be subdivided into 16×8, 8×16, 8×8, 4×8, 8×4, and 4×4 partitions. B-macroblocks can be divided into 16×8, 8×16, and 8×8 partitions. I-macroblocks can be divided into 4×4 or 8×8 partitions. Analyzing more partition options improves quality at the cost of speed. The default in FFmpeg is to analyze all partitions except p4x4 (p8x8, i8x8, i4x4, b8x8). Note that i8x8 requires 8x8dct, and is the only partition High Profile-specific. p4x4 is rarely useful (i.e. for small frame size).

-b, -pass, -crf, -maxrate, -bufsize

-b sets the desired bitrate that will be achieved using a single pass or multi-pass process using the -pass parameter. -crf define a desired average quality instead of a target bitrate.
These are all options retalted to bitrate allocation and rate control. Rate Control is a key area of video encoding and deserves a wider description.

RATE CONTROL OPTIONS

Particular attention must be paid to the Rate Control mode used. x264 supports different rate control techniques: Average Bit Rate (ABR), Costant Bit Rate (CBR), Variable Bit Rate (VBR at constant quality or constant quantization). Furthermore it is possible to use 1, 2 or more passes.

MultiPass encoding

FFmpeg supports multi pass encoding. The most common is the 2 pass encoding. In the first pass the encoder collects informations about the video’s complexity and create a stat file. In the 2nd pass the stat file is used for final encoding and better bit allocation. This is the generic syntax:

ffmpeg -i input -pass 1 [parameters] output.mp4
ffmpeg -i input -pass 2 [parameters] output.mp4

-pass 1 tells to FFmpeg to analize video and write a stat file. -pass 2 tells to read the stat file and encode accordingly. Exist also a -pass 3 option that read and update the stat. So if you want to do a 3-pass encoding the correct sequence is:

ffmpeg -i input -pass 1 [parameters] output.mp4
ffmpeg -i input -pass 3 [parameters] output.mp4
ffmpeg -i input -pass 2 [parameters] output.mp4

3-pass encoding is rarely useful.

ABR

Average Bitrate is the default rate control mode. Simply set the desired target average bitrate using -b. Remember that the bitrate can fluctuate freely locally and only the average value over the whole video duration is controlled. ABR can be performed with 1 or 2 pass but I suggest to always use a 2-pass for better data allocation.

CBR

Using the VBV model (Video Bitrate Verifier) it’s possible to obtain CBR encoding with custom buffer control. For example, to encode in canonical CBR mode use:

ffmpeg -i input -b 1000k -maxrate 1000k -bufsize 1000k [parameters] output.mp4

CBR encoding can be performed in single pass or multi pass. Single pass CBR is sufficiently efficient.

VBR

libx264 supports two unconstrained VBR modes. In pure VBR you don’t know the final average bitrate of your video but you set a target quality (or quantization) that is applied by the encoder across the whole video.

-cqp sets a costant quantization for each frame. It is rarely useful.
-crf (Constant Rate Factor) sets a target quality factor and lets the encoder to change the quantization depending by frame type and sequence complexity. Adaptive Quantization and MB-Tree techniques change quantization at macroblock level according to macroblock importance. The -crf factor can usually be chosen in the range 18 (trasparent quality) to 30-35 (low quality, but the perceived quality depends by frame resolution and device dpi).

Usually VBR encoding is performed in single pass.

SIMPLIFY YOUR LIFE USING PRESETS

Fortunately it is possible to avoid long command lines using pre-defined or custom encoding settings. Indeed I do not like very much this approach because there are a lot of cases when you need to have an accurate control over the parameters like in the case of HLS or HDS. But I recognise that the use of presets can save a lot of time in every-day works.

Profiles are simply a set of parameters enclosed in a profile file which you find in the ffpresets folder after unzipping the FFmpeg build package. Presets can change depending by the version of FFmpeg you have, so the best is to take a look at the content of the preset file. Commonly you will find a set of quality preset like libx264-hq.ffpreset or  libx264-slow.ffpreset , first pass presets like libx264-hq_firstpass.ffpreset and constraints presets like libx264-main.ffpreset or libx264-baseline.ffpreset

So, to make a 2-pass encoding in baseline profile with the HQ preset you can use a command like this:

ffmpeg -i INPUT -pass 1 -an -vcodec libx264 -vpre hq_firstpass -vpre baseline -b 1000k -s 640×360 OUTPUT.mp4
ffmpeg -i INPUT -pass 2 -acodec libfaac -ab 96k -ar 44100 -vcodec libx264 -vpre hq -b 1000k -vpre baseline -s 640×360 OUTPUT.mp4

(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

Notice that the constrains preset is applyed with a second -vpre and that the first pass has the audio encoding disabled.
Sometimes I have had problems with presets in Windows. You can bypass problems locating the presets simply using -fpre instead of -vpre. When using -fpre you must specify the absolute path to the preset file and not only the short name like in -vpre.

UPDATE:

Since FFmpeg introduced a direct access to x264 parameters it is also possible to use native x264 profiles. ES:

ffmpeg -i INPUT -an -c:v libx264 -s 960×540 -x264opts preset=slow:tune=ssim:bitrate=1000 OUTPUT.mp4

ENCODING FOR DIFFERENT DEVICES

Using the constraints presets it is possible to encode for mobile devices that usually require baseline profile to enable hardware acceleration. This limit is rapidly being surpassed by current hardware and operative systems. But if you need to target older devices (for example iOS 3 devices) and newer with the same video it’s still necessary to be able to generate easily video compliant to baseline profile. You find other generic raccomandations for multi device encoding here.

THE NEXT PART

In this part we have seen how to encode to H.264 using FFmpeg as well as the richness of encoding parameters. In the part IV of this series we will see how to leverage the FFmpeg support for RTMP streaming to enhance the Flash Video Ecosystem capabilities.

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 

Categories: Video
Follow

Get every new post delivered to your Inbox.

Join 101 other followers