FFmpeg – the swiss army knife of Internet Streaming – part III

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 


Third part

In this third part we will look more closely at the parameters you need to know to encode to H.264.

FFmpeg uses x264 library to encode to H.264. x264 offers a very wide set of parameters and therefore an accurate control over compression. However you have to know that FFmpeg applies a parameter name re-mapping and doesn’t offer the whole set of x264 options.

UPDATE: FFmpeg allows to specify directly the parameters to the underling x264 lib using the option -x264opt. -x264opt accept parameters as key=value pairs separated by “:”. ES: -x264opt bitrate=1000:profile=baseline:level=4.1…etc.

Explain the meaning of all the parameters is a long task and it is not the aim of this article. So I’ll describe only the most important and provide some useful samples. Therefore, if you want to go deeper in the parameterization of FFmpeg, I can suggest you to read this article to know the meaning of each x264 parameters and the mapping between FFmpeg and x264. To know more about the technical principles of H.264 encoding, I suggest also to take a look at the first part of my presentions at MAX2008, MAX2009 and MAX2010.

ENCODING IN H.264 WITH FFMPEG

Let’s start analyzing a sample command line to encode in H.264 :

ffmpeg -i INPUT -r 25 -b 1000k –s 640×360 -c:v libx264 -flags +loop -me_method hex -g 250 -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -bf 3 -b_strategy 1 -i_qfactor 0.71 -cmp +chroma -subq 8 -me_range 16 -coder 1 -sc_threshold 40 -flags2 +bpyramid+wpred+mixed_refs+dct8x8+fastpskip -keyint_min 25 -refs 3 -trellis 1 –level 30 -directpred 1 -partitions -parti8x8-parti4x4-partp8x8-partp4x4-partb8x8 -threads 0 -acodec libfaac -ar 44100 -ab 96k -y OUTPUT.mp4

(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

This command line encodes the INPUT file using a framerate of 25 Fps (-r), a target bitrate of 1000Kbit/s (-b), a gop max-size of 250 frames (-g), 3 b-frames (-bf) and resizing the input to 640×360 (-s). The level is set to 3.0 (-level), the entropy coder to CABAC (-coder 1) and the number of reference frames to 3 (-refs). The profile is determined by the presence of b-frames, dct8x8 and Cabac, so it is an high-profile. Notice the syntax to enable/disable options in the multi options parameters like -partitions, -flags2 and -cmp. The string -flags2 +bpyramid+wpred+mixed_refs+dct8x8″ means that you are enabling b-pyramid, weighted prediction, mixed references frames and the use of the 8×8 dct. So for example, if you want to disable dct8x8 to generate an output compliant with the main-profile, you can do that changing the previous string to -flags2 +bpyramid+wpred+mixed_refs-dct8x8″ (notice the “-” character in front of dct8x8 instead of “+”). Disabling dct8x8 you obtain a main profile, disabling also b-frames and CABAC (setting “-bf 0” and  “-codec 0“) you obtain a baseline-profile.

Profiles and Levels are very important for device compatibility so it is important to know how to produce a specific profile and level pair. You find a short primer to profiles and levels here and generic raccomandations for multi device encoding here.

MAIN PARAMETERS

Here you find a short explanation of the most significative parameters.

-me_method

Sets the accuracy of the search method in motion estimation. Allowed values: dia (fastest), hex, umh, full (slowest). Dia is usually used for first pass encoding only and full is too slow and not significantly better than umh. For single pass encoding or the second pass in multi-pass encoding use umh or hex depending by encoding speed requirements or constraints.

-subq

Sets the accuracy of motion vectors. Accepts values in the range 1-10. Use lower values like 1-3 for first pass and higher values like 7-10 for the second pass. Again, the effective value depends by a quality/speed tradeoff.

-g, -keyint_min, -sc_threshold

x264 uses by default a dynamic gop size. -g selects the max gop size, -keyint_min the min size. –sc_threshold is the Scene Change sensitivity (0-100). At every scene change a new i-frame (intra compressed frame) is inserted. Depending by -g and –keyint_min an I-frame (IDR frame alias keyframe) is inserted instead. The gop can be long (i.e. -g 300) for compression efficiency sake, or short (i.e. 25/50) for accessibility sake. This depends by what you need to achieve and by the delivery technique used (when using RTMP streaming you can seek to every frame, with progressive downloading only to IDRs). Sometimes you may need to have a consistent, contant gop size across multiple bitrates (i.e. for Http Dynamic Streaming or HLS). To do that set min and max gop size equal and disable completely scene change (i.e. -g 100 -keyint_min 100 -sc-threashold 0).

-bf, b-strategy

-bf sets the max number of consecutive b-frames (H.264 supports up to 16 b-frames). Remember that b-frames are not allowed in baseline profile. B-strategy defines the technique used for b-frames placement.

Use 0 to disable dynamic placement.
Use 1 to enable a fast-choice technique for dynamic placement. Fast but less accurate.
Use 2 to enable a slow-and-accurate mode. Can be really slow if used with an high number of b-frames.

-refs

sets the number of reference frames (H.264 supports up to 16 reference frames). Influences the encoding time. Using more than  4-5 refs gives commonly very little or null gain.

 -partitions

H.264 supports several partitions modes for MBs estimation and compensation. P-macroblocks can be subdivided into 16×8, 8×16, 8×8, 4×8, 8×4, and 4×4 partitions. B-macroblocks can be divided into 16×8, 8×16, and 8×8 partitions. I-macroblocks can be divided into 4×4 or 8×8 partitions. Analyzing more partition options improves quality at the cost of speed. The default in FFmpeg is to analyze all partitions except p4x4 (p8x8, i8x8, i4x4, b8x8). Note that i8x8 requires 8x8dct, and is the only partition High Profile-specific. p4x4 is rarely useful (i.e. for small frame size).

-b, -pass, -crf, -maxrate, -bufsize

-b sets the desired bitrate that will be achieved using a single pass or multi-pass process using the -pass parameter. -crf define a desired average quality instead of a target bitrate.
These are all options retalted to bitrate allocation and rate control. Rate Control is a key area of video encoding and deserves a wider description.

RATE CONTROL OPTIONS

Particular attention must be paid to the Rate Control mode used. x264 supports different rate control techniques: Average Bit Rate (ABR), Costant Bit Rate (CBR), Variable Bit Rate (VBR at constant quality or constant quantization). Furthermore it is possible to use 1, 2 or more passes.

MultiPass encoding

FFmpeg supports multi pass encoding. The most common is the 2 pass encoding. In the first pass the encoder collects informations about the video’s complexity and create a stat file. In the 2nd pass the stat file is used for final encoding and better bit allocation. This is the generic syntax:

ffmpeg -i input -pass 1 [parameters] output.mp4
ffmpeg -i input -pass 2 [parameters] output.mp4

-pass 1 tells to FFmpeg to analize video and write a stat file. -pass 2 tells to read the stat file and encode accordingly. Exist also a -pass 3 option that read and update the stat. So if you want to do a 3-pass encoding the correct sequence is:

ffmpeg -i input -pass 1 [parameters] output.mp4
ffmpeg -i input -pass 3 [parameters] output.mp4
ffmpeg -i input -pass 2 [parameters] output.mp4

3-pass encoding is rarely useful.

ABR

Average Bitrate is the default rate control mode. Simply set the desired target average bitrate using -b. Remember that the bitrate can fluctuate freely locally and only the average value over the whole video duration is controlled. ABR can be performed with 1 or 2 pass but I suggest to always use a 2-pass for better data allocation.

CBR

Using the VBV model (Video Bitrate Verifier) it’s possible to obtain CBR encoding with custom buffer control. For example, to encode in canonical CBR mode use:

ffmpeg -i input -b 1000k -maxrate 1000k -bufsize 1000k [parameters] output.mp4

CBR encoding can be performed in single pass or multi pass. Single pass CBR is sufficiently efficient.

VBR

libx264 supports two unconstrained VBR modes. In pure VBR you don’t know the final average bitrate of your video but you set a target quality (or quantization) that is applied by the encoder across the whole video.

-cqp sets a costant quantization for each frame. It is rarely useful.
-crf (Constant Rate Factor) sets a target quality factor and lets the encoder to change the quantization depending by frame type and sequence complexity. Adaptive Quantization and MB-Tree techniques change quantization at macroblock level according to macroblock importance. The -crf factor can usually be chosen in the range 18 (trasparent quality) to 30-35 (low quality, but the perceived quality depends by frame resolution and device dpi).

Usually VBR encoding is performed in single pass.

SIMPLIFY YOUR LIFE USING PRESETS

Fortunately it is possible to avoid long command lines using pre-defined or custom encoding settings. Indeed I do not like very much this approach because there are a lot of cases when you need to have an accurate control over the parameters like in the case of HLS or HDS. But I recognise that the use of presets can save a lot of time in every-day works.

Profiles are simply a set of parameters enclosed in a profile file which you find in the ffpresets folder after unzipping the FFmpeg build package. Presets can change depending by the version of FFmpeg you have, so the best is to take a look at the content of the preset file. Commonly you will find a set of quality preset like libx264-hq.ffpreset or  libx264-slow.ffpreset , first pass presets like libx264-hq_firstpass.ffpreset and constraints presets like libx264-main.ffpreset or libx264-baseline.ffpreset

So, to make a 2-pass encoding in baseline profile with the HQ preset you can use a command like this:

ffmpeg -i INPUT -pass 1 -an -vcodec libx264 -vpre hq_firstpass -vpre baseline -b 1000k -s 640×360 OUTPUT.mp4
ffmpeg -i INPUT -pass 2 -acodec libfaac -ab 96k -ar 44100 -vcodec libx264 -vpre hq -b 1000k -vpre baseline -s 640×360 OUTPUT.mp4

(UPDATE: libfaac is now an external library and maybe you can have problem encoding in AAC – Read part V of the series to know more about this topic.)

Notice that the constrains preset is applyed with a second -vpre and that the first pass has the audio encoding disabled.
Sometimes I have had problems with presets in Windows. You can bypass problems locating the presets simply using -fpre instead of -vpre. When using -fpre you must specify the absolute path to the preset file and not only the short name like in -vpre.

UPDATE:

Since FFmpeg introduced a direct access to x264 parameters it is also possible to use native x264 profiles. ES:

ffmpeg -i INPUT -an -c:v libx264 -s 960×540 -x264opts preset=slow:tune=ssim:bitrate=1000 OUTPUT.mp4

ENCODING FOR DIFFERENT DEVICES

Using the constraints presets it is possible to encode for mobile devices that usually require baseline profile to enable hardware acceleration. This limit is rapidly being surpassed by current hardware and operative systems. But if you need to target older devices (for example iOS 3 devices) and newer with the same video it’s still necessary to be able to generate easily video compliant to baseline profile. You find other generic raccomandations for multi device encoding here.

THE NEXT PART

In this part we have seen how to encode to H.264 using FFmpeg as well as the richness of encoding parameters. In the part IV of this series we will see how to leverage the FFmpeg support for RTMP streaming to enhance the Flash Video Ecosystem capabilities.

[Index]

PART I – Introduction (revised 02-jul-2012)
PART II – Parameters and recipes (revised 02-jul-2012)
PART III – Encoding in H.264 (revised 02-jul-2012)
PART IV – FFmpeg for streaming (revised 02-jul-2012)
PART V – Advanced usage (revised, 19-oct-2012)
PART VI – Filtering (new, 19-oct-2012)

 

9 thoughts on “FFmpeg – the swiss army knife of Internet Streaming – part III

  1. Sonnati, thanks for this great post, only one question..

    Where you say:
    disabling also b-frames and CABAC (setting “-b 0” and “-codec 0“)

    Don’t need be say -bf in place to -b ?

    Thanks,
    Ale

  2. It’s not working, Basing my cmd lines on this article gets me: [libx264 @ 0x233c780] constant rate-factor is incompatible with 2pass.

    ffmpeg -i both_video_and_sound.666.MTS -vcodec libx264 -acodec copy -deinterlace -r 25 -s 1440×1080 -ss 438 -t 3721 -pass 1 -an -vpre lossless_ultrafast -vpre baseline -b 10000k -f rawvideo -y /dev/null

    ffmpeg -i both_video_and_sound.666.MTS -vcodec libx264 -acodec copy -deinterlace -r 25 -s 1440×1080 -ss 438 -t 3721 -pass 2 -vpre lossless_max -b 10000k -vpre baseline -f rawvideo -y both_video_and_sound.666.MTS.ss438.t3721.mp4

    I based it on this one ffmpeg -i INPUT -pass 1 -an -vcodec libx264 -vpre hq_firstpass -vpre baseline -b 1000k -s 640×360 OUTPUT.mp4
    ffmpeg -i INPUT -pass 2 -acodec libfaac -ab 96k -ar 44100 -vcodec libx264 -vpre hq -b 1000k -vpre baseline -s 640×360 OUTPUT.mp4

    1. you are mixing 2pass, bitrate and lossless encoding which are in conflict
      lossless encoding is best effort in term of bandwidth so you can’t set bitrate
      and 2pass

  3. Hi,
    I am using panda board and Ubuntu Os .
    how to stream a raw video from a cam to an rtmp server of mp4 format with increase of fps . I am using the following command

    ./ffmpeg -framerate 30 -f video4linux2 -i /dev/video0 -f alsa -async 1 -ac 2 -i hw:2,0 -acodec aac -b:a 40k -s vga -vcodec libx264 -strict -2 -crf 25 -preset fast -b:v 320K -pass 1 -r 25 -f flv rtmp://…./mp4:demo101

    error is:
    [video4linux2,v4l2 @ 0x19d14e0] ioctl set time per frame(1/30) failed
    /dev/video0: Input/output error

    can you suggest what should i do ,i Google’d but i’m not getting a way ,

    Thanks in advance,
    Ameeth

  4. Hi,
    I am using ffmpeg to encode for streaming using vlc. Its a udp stream. File size does not matter. Only the video quality matters. Currently I am using the following encoding for HD video
    ffmpeg -i INPUT -q:v 1 -q:a 1 -y -c:v libx264 -vprofile high -preset slow -s 1280×720 -threads 0 -c:a libvo_aacenc -ac 2 -ar 44100 -ab 128k OUTPUT.mp4

    and for SD video
    ffmpeg -i INPUT -q:v 1 -q:a 1 -y -c:v libx264 -vprofile high -preset slow -s 720×480 -threads 0 -c:a libvo_aacenc -ac 2 -ar 44100 -ab 128k OUTPUT.mp4

    can you please tell me what other settings do I need. These encoded files will also need to be previewed on the web for which I am using the jwPlayer but the streaming quality is more important.

    I did try to find bit rates for these HD and SD resolutions but could not find a consensus among netizens.

    Moreover is there a way to detect volume levels and fix the levels using just ffmpeg? I know it can be done with SOX but I just wanted to know if there is one using just ffmpeg.

    Your blog was very helpful.
    Thanks

Leave a comment