Technical Details of the Blueberry Release
The following text assumes the reader has prior knowledge about video codecs and hardware designs.
We reached the aforementioned +0.82 dB PSNR gains by adding the following features to the encoder:
In terms of silicon usage, Blueberry costs 13% more logic gates than Anthill, while the internal memory requirement remains unchanged. We optimized the maximum attainable clock frequency from Anthill’s 376 MHz to 392 MHz (TSMC 65nm, LP), which allows the chip manufacturer to get some more fps, which can be useful if you are doing multiple simultaneous encodes or running in a slow-motion mode (i.e. VGA 200 fps).
Comparing the quality difference between Anthill and Blueberry, we measured their average PSNR and SSIM quality over 46 test sequences and at a wide quantizer range. A few example results are shown below (positive numbers mean Blueberry was better):
As our focus in the improvement work has been on the video conferencing use case, let’s dig a bit deeper there. The following graph shows PSNR quality metrics for a 720p video call, comparing the H1 Blueberry release to Anthill and libvpx Bali release in different complexity modes (higher is better).
From the graph it can be seen that the Blueberry release encodes the video conference content at the same quality using up to 30% less bits than Anthill. It also beats libvpx’s simplest real-time setting at a much lower bitrate than before.
While more improvements are on the way for the third release of the H1 encoder, the current performance is already very competitive - and the hardware now comes with hooks for further software-based optimizations.
Aki Kuusela is Engineering Manager of the WebM Project hardware team in Oulu, Finland.
We reached the aforementioned +0.82 dB PSNR gains by adding the following features to the encoder:
- Improved encoding decisions and added more coding options at macroblock level
- Enabled multiple motion vectors per macroblock (Split MV mode)
- Added preference of “nearest”, “near” and “zero” type macroblocks that are less expensive to code than others
- Added support for up to two reference frames in motion search (immediately previous and Golden frame)
- Added deblocking filter macroblock mode adaptivity support
- Added ¼ pixel precision motion estimation at 1080p resolution (previously supported only up to 720p)
- Increased the amount of token probability tracking counters (enables more efficient entropy coding)
In terms of silicon usage, Blueberry costs 13% more logic gates than Anthill, while the internal memory requirement remains unchanged. We optimized the maximum attainable clock frequency from Anthill’s 376 MHz to 392 MHz (TSMC 65nm, LP), which allows the chip manufacturer to get some more fps, which can be useful if you are doing multiple simultaneous encodes or running in a slow-motion mode (i.e. VGA 200 fps).
Comparing the quality difference between Anthill and Blueberry, we measured their average PSNR and SSIM quality over 46 test sequences and at a wide quantizer range. A few example results are shown below (positive numbers mean Blueberry was better):
Sequence | Resolution | PSNR [dB] | SSIM |
city | qcif | +0.80 | +0.033 |
table | qcif | +0.86 | +0.009 |
ice | qcif | +0.89 | +0.005 |
suzie | qcif | +0.82 | +0.013 |
crew | cif | +0.46 | +0.012 |
ice | cif | +1.21 | +0.006 |
crew | 4cif | +0.48 | +0.010 |
soccer | 4cif | +0.70 | +0.022 |
video_conferencing | 720p | +1.14 | +0.006 |
rush_hour | 1080p | +0.92 | +0.004 |
pedestrian_area | 1080p | +1.09 | +0.013 |
whale_show | 1080p | +0.21 | +0.006 |
sunflower | 1080p | +1.68 | +0.007 |
As our focus in the improvement work has been on the video conferencing use case, let’s dig a bit deeper there. The following graph shows PSNR quality metrics for a 720p video call, comparing the H1 Blueberry release to Anthill and libvpx Bali release in different complexity modes (higher is better).
From the graph it can be seen that the Blueberry release encodes the video conference content at the same quality using up to 30% less bits than Anthill. It also beats libvpx’s simplest real-time setting at a much lower bitrate than before.
While more improvements are on the way for the third release of the H1 encoder, the current performance is already very competitive - and the hardware now comes with hooks for further software-based optimizations.
Aki Kuusela is Engineering Manager of the WebM Project hardware team in Oulu, Finland.
Polite, on-topic comments are welcomed on the webm-discuss mailing list. Please link to this post when commenting.