Rounding Average instructions #126

Maratyszcza · 2019-10-28T19:19:28Z

Introduction

Rounding Average of two integer inputs, defined as avg(a, b) := (a + b + 1) >> 1, is a common operation in fixed-point numerical algorithms, such as video- and audio-codecs, and image filtering. Direct implementation of Rounding Average in SIMD instruction sets following the formula (a + b + 1) >> 1 is tricky, because while the sum a + b + 1 can overflow the datatype of inputs, the final result always fits into the same datatype. To avoid the expensive work-around of computing a + b + 1 in higher precision (e.g. extending inputs from 8-bit elements to 16-bit elements for the computation), all common SIMD instruction sets provide some forms of Rounding Average instructions.

This PR introduce two new WebAssembly instructions for Rounding Average operations, i8x16.avgr_u and i16x8.avgr_u, which operate on vectors of unsigned 8-bit and unsigned 16-bit integers accordingly. These instructions match the universally supported across x86, ARM, and POWER forms of the Rounding Average operation.

[October 31 update] Applications

Below are examples of optimized libraries using close equivalents of the proposed i8x16.avgr_u and i16x8.avgr_u instructions:

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

i8x16.avgr_u
- y = i8x16.avgr_u(a, b) is lowered to VPAVGB xmm_y, xmm_a, xmm_b
i16x8.avgr_u
- y = i16x8.avgr_u(a, b) is lowered to VPAVGW xmm_y, xmm_a, xmm_b

x86/x86-64 processors with SSE2 instruction set

i8x16.avgr_u
- a = i8x16.avgr_u(a, b) is lowered to PAVGB xmm_a, xmm_b
- y = i8x16.avgr_u(a, b) is lowered to MOVDQA xmm_y, xmm_a + PAVGB xmm_y, xmm_b
i16x8.avgr_u
- a = i16x8.avgr_u(a, b) is lowered to PAVGW xmm_a, xmm_b
- y = i16x8.avgr_u(a, b) is lowered to MOVDQA xmm_y, xmm_a + PAVGW xmm_y, xmm_b

ARM64 processors

i8x16.avgr_u
- y = i8x16.avgr_u(a, b) is lowered to URHADD Vy.16B, Va.16B, Vb.16B
i16x8.avgr_u
- y = i16x8.avgr_u(a, b) is lowered to URHADD Vy.8H, Va.8H, Vb.8H

ARMv7 processors with NEON instruction set

i8x16.avgr_u
- y = i8x16.avgr_u(a, b) is lowered to VRHADD.U8 Qy, Qa, Qb
i16x8.avgr_u
- y = i16x8.avgr_u(a, b) is lowered to VRHADD.U16 Qy, Qa, Qb

POWER processors with VMX (Altivec) instruction set

i8x16.avgr_u
- y = i8x16.avgr_u(a, b) is lowered to VAVGUB VRy, VRa, VRb
i16x8.avgr_u
- y = i16x8.avgr_u(a, b) is lowered to VAVGUH VRy, VRa, VRb

MIPS processors with MSA instruction set

i8x16.avgr_u
- y = i8x16.avgr_u(a, b) is lowered to AVER_U.B Wy, Wa, Wb
i16x8.avgr_u
- y = i16x8.avgr_u(a, b) is lowered to AVER_U.H Wy, Wa, Wb

Maratyszcza · 2019-11-01T01:37:21Z

Added examples of applications. @arunetm, @AndrewScheidecker, @sunfishcode, @dtig, @penzn, @tlively, @abrown PTAL

abrown · 2019-11-01T17:29:39Z

I don't have much experience with codecs but the example applications make a good case for including this instruction. Thanks for making the x86 encodings explicit--they make sense to me.

dtig · 2019-11-18T11:32:58Z

I'm in favor of adding these operations to the proposal. These are well supported on multiple relevant platforms, and emulating them is expensive. The examples listed by @Maratyszcza are compelling use cases. Are there any objections to adding these to the proposal?

Adding a poll here for responses.

In favor of including Rounding Average operations in the current proposal, please respond with 👍
Against including Rounding Average operations in the current proposal, please respond with 👎

Summary: These instructions were added to the spec proposal in WebAssembly/simd#126. Their semantics are equivalent to `(a + b + 1) / 2`. The opcode for the experimental i32x4.dot_i16x8_s is also bumped due to a collision with the i8x16.avgr_u opcode. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71628

As specified in WebAssembly/simd#126.

Maratyszcza mentioned this pull request Oct 28, 2019

SIMD Sync meeting 10/22/2019 Agenda #121

Closed

Rounding Average instructions

1106957

Maratyszcza force-pushed the avg branch from cc67e7c to 1106957 Compare November 24, 2019 08:10

dtig approved these changes Dec 5, 2019

View reviewed changes

dtig merged commit d254eea into WebAssembly:master Dec 5, 2019

tlively added a commit to tlively/binaryen that referenced this pull request Dec 18, 2019

SIMD {i8x16,i16x8}.avgr_u instructions

d18a957

As specified in WebAssembly/simd#126.

tlively mentioned this pull request Dec 18, 2019

SIMD {i8x16,i16x8}.avgr_u instructions WebAssembly/binaryen#2539

Merged

tlively added a commit to WebAssembly/binaryen that referenced this pull request Dec 18, 2019

SIMD {i8x16,i16x8}.avgr_u instructions (#2539)

8b15cee

As specified in WebAssembly/simd#126.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rounding Average instructions #126

Rounding Average instructions #126

Uh oh!

Maratyszcza commented Oct 28, 2019 •

edited

Loading

Uh oh!

Maratyszcza commented Nov 1, 2019

Uh oh!

abrown commented Nov 1, 2019

Uh oh!

dtig commented Nov 18, 2019

Uh oh!

Uh oh!

Rounding Average instructions #126

Rounding Average instructions #126

Uh oh!

Conversation

Maratyszcza commented Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

[October 31 update] Applications

Mapping to Common Instruction Sets

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSE2 instruction set

ARM64 processors

ARMv7 processors with NEON instruction set

POWER processors with VMX (Altivec) instruction set

MIPS processors with MSA instruction set

Uh oh!

Maratyszcza commented Nov 1, 2019

Uh oh!

abrown commented Nov 1, 2019

Uh oh!

dtig commented Nov 18, 2019

Uh oh!

Uh oh!

Maratyszcza commented Oct 28, 2019 •

edited

Loading