Monday, March 14, 2011

Introducing "Anthill," the First VP8 Hardware Encoder IP Release

Last week the WebM Finland team finalized our H1 hardware RTL design. The H1 is the world’s first VP8 hardware encoder. This initial release, which we're calling "Anthill," is now available through the WebM Project hardware page. Google does not require payment of any license fee or royalty in connection with use of the H1 encoder RTL.

Why "Anthill"? 77% of Finland is covered by forests, and the Finns are very fond of them. The Finnish freedom to roam rights allow anyone to wander in the woods, and pick wild berries, flowers and mushrooms. We thought it would be fitting to alphabetically name each VP8 hardware release with things that can be found amidst our Finnish evergreens.

The H1 encoder offloads the entire VP8 video encoding process from the host CPU to a separate accelerator block on the SOC. It significantly reduces power consumption and enables encoding of 1080p resolution video at full 30 FPS, or 720p at 60 FPS. Without a hardware accelerator like the H1, modern multi-core mobile devices can only encode video at around VGA 25 FPS, and are not able to do much else while doing that.

To provide an idea of our hardware's capabilities we compared it to the WebM Project's VP8 software encoder* (libvpx). The figures below show the required processor cycles for VGA resolution video at 30 frames per second, and are scaled from the FPS speed reached when running the Tegra2 at 1 GHz#.

Note: Power consumption measurements are for the ARM core vs. H1 encoder core in TSMC 65nm technology. ARM power consumption is estimated using the 65nm figure given at http://www.arm.com/products/processors/cortex-a/cortex-a9.php. H1 encoder core is measured using RTL netlist and Synopsys Power Compiler.

In terms of quality, hardware implementations of real-time encoders are typically behind those running on software, as adaptive algorithms related to motion search and mode selection (or exact rate-distortion optimizations) are often not feasible options in hardware. The following graph shows PSNR quality metrics for a 720p video conferencing use case, comparing the H1 Anthill release to the libvpx Bali release in different complexity modes (higher PSNR is better).



These graphs show that the H1 hardware encoder can produce good quality with very low power consumption using almost no clock cycles from the CPU. In the next release, we are planning to narrow the quality gap between the libvpx "Best" mode and the hardware implementation, while cutting down the required power even further. The next release is planned to be out in early Q2.

Several top-tier semiconductor partners have already started to integrate the H1 IP into their next chipsets, and we’re eager to share the technology with new partners.

For technical and licensing details about the H1, see our hardware page.

Aki Kuusela is Engineering Manager of the WebM Project hardware team in Oulu, Finland.

*libvpx Aylesbury and Bali software encoder releases running NVidia Tegra2 development board with dual-core ARM Cortex A9 processors. In the test, libvpx was using both cores with the slowest and fastest real-time settings (-cpu-used=-5 and -cpu-used=-16).

#For example, if the Tegra2 ran at 1000 MHz to achieve 6 FPS, 30 FPS requires 5000 MHz (30/6=5).

Polite, on-topic comments are welcomed on the webm-discuss mailing list. Please link to this post when commenting.