Thursday, May 27, 2010

Inside WebM Technology: The VP8 Alternate Reference Frame

Since the WebM project was open-sourced just a week ago, we've seen blog posts and articles about its capabilities. As an open project, we welcome technical scrutiny and contributions that improve the codec. We know from our extensive testing that VP8 can match or exceed other leading codecs, but to get the best results, it helps to understand more about how the codec works. In this first of a series of blog posts, I'll explain some of the fundamental techniques in VP8, along with examples and metrics.

The alternative reference frame is one of the most exciting quality innovations in VP8. Let’s delve into how VP8 uses these frames to improve prediction and thereby overall video quality.

Alternate Reference Frames in VP8

VP8 uses three types of reference frames for inter prediction: the last frame, a "golden" frame (one frame worth of decompressed data from the arbitrarily distant past) and an alternate reference frame. Overall, this design has a much smaller memory footprint on both encoders and decoders than designs with many more reference frames. In video compression, it is very rare for more than three reference frames to provide significant quality benefit, but the undesirable increase in memory footprint from the extra frames is substantial.

Unlike other types of reference frames used in video compression, which are displayed to the user by the decoder, the VP8 alternate reference frame is decoded normally but is never shown to the user. It is used solely as a reference to improve inter prediction for other coded frames. Because alternate reference frames are not displayed, VP8 encoders can use them to transmit any data that are helpful to compression. For example, a VP8 encoder can construct one alternate reference frame from multiple source frames, or it can create an alternate reference frame using different macroblocks from hundreds of different video frames.

The current VP8 implementation enables two different types of usage for the alternate reference frame: noise-reduced prediction and past/future directional prediction.

Noise-Reduced Prediction

The alternate reference frame is transmitted and decoded similar to other frames, hence its usage does not add extra computation in decoding. The VP8 encoder however is free to use more sophisticated processing to create them in off-line encoding. One application of the alternate reference frame is for noise-reduced prediction. In this application, the VP8 encoder uses multiple input source frames to construct one reference frame through temporal or spatial noise filtering. This "noise-free" alternate reference frame is then used to improve prediction for encoding subsequent frames.

You can make use of this feature by setting ARNR parameters in VP8 encoding, where ARNR stands for "Alternate Reference Noise Reduction." A sample two-pass encoding setting with the parameters:

--arnr-maxframes=5 --arnr-strength=3

enables the encoder to use "5" consecutive input source frames to produce one alternate reference frame using a filtering strength of "3". Here is an example showing the quality benefit of using this experimental "ARNR" feature on the standard test clip "Hall Monitor." (Each line on the graph represents the quality of an encoded stream on a given clip at multiple datarates. The higher points on the Y axis (PSNR) indicates the stream with the better quality.)

The only difference between the two curves in the graph is that VP8_ARNR was produced by encodings with ARNR parameters and VP8_NO_ARNR was not. As we can see from the graph, noise reduced prediction is very helpful to compression quality when encoding noisy sources. We've just started to explore this idea but have already seen strong improvements on noisy input clips similar to this "Hall Monitor." We feel there's a lot more we can do in this area.

Improving Prediction without B Frames

The lack of B frames in VP8 has sparked some discussion about its ability to achieve competitive compression efficiency. VP8 encoders, however, can make intelligent use of the golden reference and the alternate reference frames to compensate for this. The VP8 encoder can choose to transmit an alternate reference frame similar to a "future" frame, and encoding of subsequent frames can make use of information from the past (last frame and golden frame) and from the future (alternate reference frame). Effectively, this helps the encoder to achieve results similar to bidirectional (B frame) prediction without requiring frame reordering in the decoder. Running in two-pass encoding mode, compression can be improved in the VP8 encoder by using encoding parameters that enable lagged encoding and automatic placement of alternate reference frames:

--auto-alt-ref=1 --lag-in-frames=16

Used this way, the VP8 encoder can achieve improved prediction and compression efficiency without increasing the decoder’s complexity:

In the video compression community, "Mobile and calendar" is known as a clip that benefits significantly from the usage of B frames. The graph above illustrates that the use of alternate reference frame benefits VP8 significantly without using B frames.

Keep an eye on this blog for more posts about VP8 encoding. You can find more information on above encoding parameters or other detailed instructions to use with our VP8 encoders on our site, or join our discussion list.

Yaowu Xu, Ph.D. is a codec engineer at Google.

Labels: ,

Polite, on-topic comments are welcomed on the webm-discuss mailing list. Please link to this post when commenting.