Thursday, May 12, 2011

Technical Details of the Blueberry Release

The following text assumes the reader has prior knowledge about video codecs and hardware designs.

We reached the aforementioned +0.82 dB PSNR gains by adding the following features to the encoder:
  • Improved encoding decisions and added more coding options at macroblock level
  • Enabled multiple motion vectors per macroblock (Split MV mode)
  • Added preference of “nearest”, “near” and “zero” type macroblocks that are less expensive to code than others
  • Added support for up to two reference frames in motion search (immediately previous and Golden frame)
  • Added deblocking filter macroblock mode adaptivity support
  • Added ¼ pixel precision motion estimation at 1080p resolution (previously supported only up to 720p)
  • Increased the amount of token probability tracking counters (enables more efficient entropy coding)
In addition, we added support for a programmable segment map, which enables psychovisual quality optimizations and defining region-of-interests. This means we can for example code the foreground objects (i.e. people) with a better quality (smaller quantizer) than the static background. We also added new hooks to the hardware that allows us to improve the quality of the encoder by later firmware upgrades that optmize our cost function algorithms - even after the chip has been manufactured.

In terms of silicon usage, Blueberry costs 13% more logic gates than Anthill, while the internal memory requirement remains unchanged. We optimized the maximum attainable clock frequency from Anthill’s 376 MHz to 392 MHz (TSMC 65nm, LP), which allows the chip manufacturer to get some more fps, which can be useful if you are doing multiple simultaneous encodes or running in a slow-motion mode (i.e. VGA 200 fps).

Comparing the quality difference between Anthill and Blueberry, we measured their average PSNR and SSIM quality over 46 test sequences and at a wide quantizer range. A few example results are shown below (positive numbers mean Blueberry was better):

SequenceResolutionPSNR [dB] SSIM
cityqcif+0.80+0.033
table qcif+0.86+0.009
iceqcif+0.89+0.005
suzieqcif+0.82+0.013
crewcif+0.46+0.012
icecif+1.21+0.006
crew4cif+0.48+0.010
soccer4cif+0.70+0.022
video_conferencing720p+1.14+0.006
rush_hour1080p+0.92+0.004
pedestrian_area1080p+1.09+0.013
whale_show1080p+0.21+0.006
sunflower1080p+1.68+0.007

As our focus in the improvement work has been on the video conferencing use case, let’s dig a bit deeper there. The following graph shows PSNR quality metrics for a 720p video call, comparing the H1 Blueberry release to Anthill and libvpx Bali release in different complexity modes (higher is better).



From the graph it can be seen that the Blueberry release encodes the video conference content at the same quality using up to 30% less bits than Anthill. It also beats libvpx’s simplest real-time setting at a much lower bitrate than before.

While more improvements are on the way for the third release of the H1 encoder, the current performance is already very competitive - and the hardware now comes with hooks for further software-based optimizations.

Aki Kuusela is Engineering Manager of the WebM Project hardware team in Oulu, Finland.

Polite, on-topic comments are welcomed on the webm-discuss mailing list. Please link to this post when commenting.