Friday, October 30, 2009

x264 encoding options

One of H.264's most useful features is the ability to choose among many combinations of inter and intra partitions. Pictures are typically in units called macroblocks. A macroblock has typically the size 16x16 pixels and can have different types. If the macroblock type is Intra (I), that part of the decoded image is completely replaced by a new texture, while if the macroblock type is Inter (P) the decoded macroblock data is added to what was previously decoded in that macroblock area. P-macroblocks can be subdivided into 16x8, 8x16, 8x8, 4x8, 8x4, and 4x4 partitions. B-macroblocks can be divided into 16x8, 8x16, and 8x8 partitions. I-macroblocks can be divided into 4x4 or 8x8 partitions. Analyzing more partition options improves quality at the cost of speed. The default is to analyze all partitions except p4x4 (p8x8, i8x8, i4x4, b8x8), since p4x4 is not particularly useful except at high bitrates and lower resolutions. Note that i8x8 requires 8x8dct, and is therefore a High Profile-only partition. p8x8 is the most costly, speed-wise, of the partitions, but also gives the most benefit. Generally, whenever possible, all partition types except p4x4 should be used.
When encoding with h264, the screen is divided into 16x16 Macroblocks (MBs). These blocks can be subdivided further into partitions of various sizes (16x8, 8x16, 8x8, 4x4). Motion estimation is better with smaller subpartitions, but the overhead for such information is similarly increased.
DCT block size is separate from motion block ("partition") size. The DCT has a choice of 8x8 or 4x4. Motion partitions have a choice of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, or 4x4. Each (inter-) macroblock has both a DCT block size and some motion partition size(s). The only restriction on combining the two is that partitions can't be smaller than their DCT blocks.
Small partitions provide better prediction but cost more bits. There is no overhead one way or the other for DCT block size, but also one isn't always better compression than the other, so that decision is based just on which one is more appropriate to the given image region.
That said, high profile also introduced a new intra partition, "i8x8". Intra partition sizes now have a choice between 16x16, 8x8, and 4x4, which is decided based on prediction quality vs bit cost just like motion partitions are.
In x264, --8x8dct enables high profile, and x264 allows i8x8 partitions if high profile is enabled, but they should not be confused as being the same feature.
Here's where I put most of my emphasis in the tests for mobile:
    -refs (FFmpeg)
    One of H.264's most useful features is the abillity to reference frames other than the one immediately prior to the current frame. This parameter lets one specify how many references can be used, through a maximum of 16. Increasing the number of refs increases the DPB (Decoded Picture Buffer) requirement, which means hardware playback devices will often have strict limits to the number of refs they can handle. In live-action sources, more reference have limited use beyond 4-8, but in cartoon sources up to the maximum value of 16 is often useful. More reference frames require more processing power because every frame is searched by the motion search (except when an early skip decision is made). The slowdown is especially apparent with slower motion estimation methods. Recommended default: -refs 6
      -deblockalpha (FFmpeg)
      -deblockbeta (FFmpeg)
      One of H.264's main features is the in-loop deblocker, which avoids the problem of blocking artifacts disrupting motion estimation. This requires a small amount of decoding CPU, but considerably increases quality in nearly all cases. Its strength may be raised or lowered in order to avoid more artifacts or keep more detail, respectively. Deblock has two parameters: alpha (strength) and beta (threshold). Recommended defaults:-deblockalpha 0 -deblockbeta 0 (Must have '-flags +loop')

      me_method (FFmpeg)

      dia (x264) / epzs (FFmpeg) is the simplest search, consisting of starting at the best predictor, checking the motion vectors at one pixel upwards, left, down, and to the right, picking the best, and repeating the process until it no longer finds any better motion vector.

      hex (x264) / hex (FFmpeg) consists of a similar strategy, except it uses a range-2 search of 6 surrounding points, thus the name. It is considerably more efficient than DIA and hardly any slower, and therefore makes a good choice for general-use encoding.

      umh (x264) / umh (FFmpeg) is considerably slower than HEX, but searches a complex multi-hexagon pattern in order to avoid missing harder-to-find motion vectors. Unlike HEX and DIA, the merange parameter directly controls UMH's search radius, allowing one to increase or decrease the size of the wide search.

      esa (x264) / full (FFmpeg) is a highly optimized intelligent search of the entire motion search space within merange of the best predictor. It is mathematically equivalent to the bruteforce method of searching every single motion vector in that area, though faster. However, it is still considerably slower than UMH, with not too much benefit, so is not particularly useful for everyday encoding.

      One of the most important settings for x264, both speed and quality-wise. Looking at full vs. hex vs. umh.

      -subq 6 (FFmpeg)

      1: Fastest, but extremely low quality. Should be avoided except on first pass encoding.

      2-5: Progressively better and slower, 5 serves as a good medium for higher speed encoding.

      6-7: 6 is the default. Activates rate-distortion optimization for partition decision. This can considerably improve efficiency, though it has a notable speed cost. 6 activates it in I/P frames, and subme7 activates it in B frames.

      8-9: Activates rate-distortion refinement, which uses RDO to refine both motion vectors and intra prediction modes. Slower than subme 6, but again, more efficient.

      An extremely important encoding parameter which determines what algorithms are used for both subpixel motion searching and partition decision. Checking 6 or 8.

See also: (Android notes) (and see the Wikipedia reference for more math)

No comments: