Tuesday, July 20, 2010

Inside WebM Technology: VP8 Intra and Inter Prediction

Continuing our series on WebM technology, I will discuss the use of prediction methods in the VP8 video codec, with special attention to the TM_PRED and SPLITMV modes, which are unique to VP8.

First, some background. To encode a video frame, block-based codecs such as VP8 first divide the frame into smaller segments called macroblocks. Within each macroblock, the encoder can predict redundant motion and color information based on previously processed blocks. The redundant data can be subtracted from the block, resulting in more efficient compression.

Image by Fido Factor, licensed under Creative Commons Attribution License.
Based on a work at www.flickr.com

A VP8 encoder uses two classes of prediction:
  • Intra prediction uses data within a single video frame
  • Inter prediction uses data from previously encoded frames
The residual signal data is then encoded using other techniques, such as transform coding.

VP8 Intra Prediction Modes
VP8 intra prediction modes are used with three types of macroblocks:
  • 4x4 luma
  • 16x16 luma
  • 8x8 chroma
Four common intra prediction modes are shared by these macroblocks:
  • H_PRED (horizontal prediction). Fills each column of the block with a copy of the left column, L.
  • V_PRED (vertical prediction). Fills each row of the block with a copy of the above row, A.
  • DC_PRED (DC prediction). Fills the block with a single value using the average of the pixels in the row above A and the column to the left of L.
  • TM_PRED (TrueMotion prediction). A mode that gets its name from a compression technique developed by On2 Technologies. In addition to the row A and column L, TM_PRED uses the pixel P above and to the left of the block. Horizontal differences between pixels in A (starting from P) are propagated using the pixels from L to start each row.
For 4x4 luma blocks, there are six additional intra modes similar to V_PRED and H_PRED, but correspond to predicting pixels in different directions. These modes are outside the scope of this post, but if you want to learn more see the VP8 Bitstream Guide.

As mentioned above, the TM_PRED mode is unique to VP8. The following figure uses an example 4x4 block of pixels to illustrate how the TM_PRED mode works:
Where C, As and Ls represent reconstructed pixel values from previously coded blocks, and X00 through X33 represent predicted values for the current block. TM_PRED uses the following equation to calculate Xij:

Xij = Li + Aj - C (i, j=0, 1, 2, 3)

Although the above example uses a 4x4 block, the TM_PRED mode for 8x8 and 16x16 blocks works in the same fashion.
TM_PRED is one of the more frequently used intra prediction modes in VP8, and for common video sequences it is typically used by 20% to 45% of all blocks that are intra coded. Overall, together with other intra prediction modes, TM_PRED helps VP8 to achieve very good compression efficiency, especially for key frames, which can only use intra modes (key frames by their very nature cannot refer to previously encoded frames).

VP8 Inter Prediction Modes

In VP8, inter prediction modes are used only on inter frames (non-key frames). For any VP8 inter frame, there are typically three previously coded reference frames that can be used for prediction. A typical inter prediction block is constructed using a motion vector to copy a block from one of the three frames. The motion vector points to the location of a pixel block to be copied. In most video compression schemes, a good portion of the bits are spent on encoding motion vectors; the portion can be especially large for video encoded at lower datarates.

Like previous VPx codecs, VP8 encodes motion vectors very efficiently by reusing vectors from neighboring macroblocks (a macroblock includes one 16x16 luma block and two 8x8 chroma blocks). VP8 uses a similar strategy in the overall design of inter prediction modes. For example, the prediction modes "NEAREST" and "NEAR" make use of last and second-to-last, non-zero motion vectors from neighboring macroblocks. These inter prediction modes can be used in combination with any of the three different reference frames.

In addition, VP8 has a very sophisticated, flexible inter prediction mode called SPLITMV. This mode was designed to enable flexible partitioning of a macroblock into sub-blocks to achieve better inter prediction. SPLITMV is very useful when objects within a macroblock have different motion characteristics. Within a macroblock coded using SPLITMV mode, each sub-block can have its own motion vector. Similar to the strategy of reusing motion vectors at the macroblock level, a sub-block can also use motion vectors from neighboring sub-blocks above or left to the current block. This strategy is very flexible and can effectively encode any shape of sub-macroblock partitioning, and does so efficiently. Here is an example of a macroblock with 16x16 luma pixels that is partitioned to 16 4x4 blocks:

where New represents a 4x4 bock coded with a new motion vector, and Left and Above represent a 4x4 block coded using the motion vector from the left and above, respectively. This example effectively partitions the 16x16 macroblock into 3 different segments with 3 different motion vectors (represented below by 1, 2 and 3):

Through effective use of intra and inter prediction modes, WebM encoder implementations can achieve great compression quality on a wide range of source material. If you want to delve further into VP8 prediction modes, read the VP8 Bitstream Guide or examine the reconintra.c and rdopt.c files in the VP8 source tree.

Yaowu Xu, Ph.D. is a codec engineer at Google.


Polite, on-topic comments are welcomed on the webm-discuss mailing list. Please link to this post when commenting.