Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Rounding Average instructions #126

Merged
merged 1 commit into from
Dec 5, 2019
Merged

Rounding Average instructions #126

merged 1 commit into from
Dec 5, 2019

Conversation

Maratyszcza
Copy link
Contributor

@Maratyszcza Maratyszcza commented Oct 28, 2019

Introduction

Rounding Average of two integer inputs, defined as avg(a, b) := (a + b + 1) >> 1, is a common operation in fixed-point numerical algorithms, such as video- and audio-codecs, and image filtering. Direct implementation of Rounding Average in SIMD instruction sets following the formula (a + b + 1) >> 1 is tricky, because while the sum a + b + 1 can overflow the datatype of inputs, the final result always fits into the same datatype. To avoid the expensive work-around of computing a + b + 1 in higher precision (e.g. extending inputs from 8-bit elements to 16-bit elements for the computation), all common SIMD instruction sets provide some forms of Rounding Average instructions.

This PR introduce two new WebAssembly instructions for Rounding Average operations, i8x16.avgr_u and i16x8.avgr_u, which operate on vectors of unsigned 8-bit and unsigned 16-bit integers accordingly. These instructions match the universally supported across x86, ARM, and POWER forms of the Rounding Average operation.

[October 31 update] Applications

Below are examples of optimized libraries using close equivalents of the proposed i8x16.avgr_u and i16x8.avgr_u instructions:

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

  • i8x16.avgr_u
    • y = i8x16.avgr_u(a, b) is lowered to VPAVGB xmm_y, xmm_a, xmm_b
  • i16x8.avgr_u
    • y = i16x8.avgr_u(a, b) is lowered to VPAVGW xmm_y, xmm_a, xmm_b

x86/x86-64 processors with SSE2 instruction set

  • i8x16.avgr_u
    • a = i8x16.avgr_u(a, b) is lowered to PAVGB xmm_a, xmm_b
    • y = i8x16.avgr_u(a, b) is lowered to MOVDQA xmm_y, xmm_a + PAVGB xmm_y, xmm_b
  • i16x8.avgr_u
    • a = i16x8.avgr_u(a, b) is lowered to PAVGW xmm_a, xmm_b
    • y = i16x8.avgr_u(a, b) is lowered to MOVDQA xmm_y, xmm_a + PAVGW xmm_y, xmm_b

ARM64 processors

  • i8x16.avgr_u
    • y = i8x16.avgr_u(a, b) is lowered to URHADD Vy.16B, Va.16B, Vb.16B
  • i16x8.avgr_u
    • y = i16x8.avgr_u(a, b) is lowered to URHADD Vy.8H, Va.8H, Vb.8H

ARMv7 processors with NEON instruction set

  • i8x16.avgr_u
    • y = i8x16.avgr_u(a, b) is lowered to VRHADD.U8 Qy, Qa, Qb
  • i16x8.avgr_u
    • y = i16x8.avgr_u(a, b) is lowered to VRHADD.U16 Qy, Qa, Qb

POWER processors with VMX (Altivec) instruction set

  • i8x16.avgr_u
    • y = i8x16.avgr_u(a, b) is lowered to VAVGUB VRy, VRa, VRb
  • i16x8.avgr_u
    • y = i16x8.avgr_u(a, b) is lowered to VAVGUH VRy, VRa, VRb

MIPS processors with MSA instruction set

  • i8x16.avgr_u
    • y = i8x16.avgr_u(a, b) is lowered to AVER_U.B Wy, Wa, Wb
  • i16x8.avgr_u
    • y = i16x8.avgr_u(a, b) is lowered to AVER_U.H Wy, Wa, Wb

@Maratyszcza
Copy link
Contributor Author

Added examples of applications. @arunetm, @AndrewScheidecker, @sunfishcode, @dtig, @penzn, @tlively, @abrown PTAL

@abrown
Copy link
Contributor

abrown commented Nov 1, 2019

I don't have much experience with codecs but the example applications make a good case for including this instruction. Thanks for making the x86 encodings explicit--they make sense to me.

@dtig
Copy link
Member

dtig commented Nov 18, 2019

I'm in favor of adding these operations to the proposal. These are well supported on multiple relevant platforms, and emulating them is expensive. The examples listed by @Maratyszcza are compelling use cases. Are there any objections to adding these to the proposal?

Adding a poll here for responses.

In favor of including Rounding Average operations in the current proposal, please respond with 👍
Against including Rounding Average operations in the current proposal, please respond with 👎

@dtig dtig merged commit d254eea into WebAssembly:master Dec 5, 2019
tlively added a commit to llvm/llvm-project that referenced this pull request Dec 17, 2019
Summary:
These instructions were added to the spec proposal in
WebAssembly/simd#126. Their semantics are
equivalent to `(a + b + 1) / 2`. The opcode for the experimental
i32x4.dot_i16x8_s is also bumped due to a collision with the
i8x16.avgr_u opcode.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71628
tlively added a commit to tlively/binaryen that referenced this pull request Dec 18, 2019
tlively added a commit to WebAssembly/binaryen that referenced this pull request Dec 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants