Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Shuffle with immediate indices specification #30

Closed
wants to merge 11 commits into from
73 changes: 71 additions & 2 deletions proposals/simd/SIMD.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,40 @@ The input lane value, `x`, is interpreted the same way as for the splat
instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored.

### Shuffle lanes

#### Immediate permutation rule
* `v8x16.shuffle(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128`

Returns a new vector with lanes selected from the lanes of the two input vectors
`a` and `b` specified in the 16 byte wide immediate mode operand `imm`. This
instruction is encoded with 16 bytes providing the indices of the elements to
`a` and `b` specified in the 10 byte wide immediate mode operand `imm`. This
instruction is encoded with 10 bytes providing the indices of the elements to
return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
indices in range `[16, 31]` select the `i - 16`-th element of `b`.

* `v16x8.shuffle(a: v128, b: v128, imm: ImmLaneIdx16[8]) -> v128`

Returns a new vector with lanes selected from the lanes of the two input vectors
`a` and `b` specified in the 3 byte wide immediate mode operand `imm`. This
instruction is encoded with 3 bytes providing the indices of the elements to
return. The indices `i` in range `[0, 7]` select the `i`-th element of `a`. The
indices in range `[8, 15]` select the `i - 8`-th element of `b`.

* `v32x4.shuffle(a: v128, b: v128, imm: ImmLaneIdx8[4]) -> v128`

Returns a new vector with lanes selected from the lanes of the two input vectors
`a` and `b` specified in the 2 byte wide immediate mode operand `imm`. This
instruction is encoded with 2 bytes providing the indices of the elements to
return. The indices `i` in range `[0, 3]` select the `i`-th element of `a`. The
indices in range `[4, 7]` select the `i - 4`-th element of `b`.

* `v64x2.shuffle(a: v128, b: v128, imm: ImmLaneIdx4[2]) -> v128`

Returns a new vector with lanes selected from the lanes of the two input vectors
`a` and `b` specified in the 1 byte wide immediate mode operand `imm`. This
instruction is encoded with 1 bytes providing the indices of the elements to
return. The indices `i` in range `[0, 1]` select the `i`-th element of `a`. The
indices in range `[2, 3]` select the `i - 2`-th element of `b`.

```python
def S.shuffle(a, b, s):
result = S.New()
Expand All @@ -304,6 +330,21 @@ def S.shuffle(a, b, s):
return result
```

#### Variable permutation rule
* `v8x16.permute_dyn(a: v128, s: v128) -> v128`

Returns a new vector with lanes selected from the lanes of the first input vector
`a` and specified in the second input vector `s`. The indices from `s` are first
fit into the range `[0, 15]` via a modulo.

```python
def S.permute_dyn(a, s):
result = S.New()
for i in range(S.Lanes):
result[i] = a[s[i] % S.Lanes]
return result
```

## Integer arithmetic

Wrapping integer arithmetic discards the high bits of the result.
Expand Down Expand Up @@ -756,3 +797,31 @@ Lane-wise saturating conversion from floating point to integer using the IEEE
resulting lane is 0. If the rounded integer value of a lane is outside the
range of the destination type, the result is saturated to the nearest
representable integer value.


## Reductions

There is no instruction for reductions.
Instead, one can use permutations to reduce lane-wise operations like `add`, `min`, `max`, `and`, `or`...

Here is an example to reduce add on f32x4:
```
get_local 0
get_local 0
v64x2.shuffle 1 0 ;; swap the lower part with the higher part of the vector
f32x4.add
get_local 0
get_local 0
v32x4.shuffle 1 0 3 2 ;; swap the 2 first elements together, and the 2 last elements together
f32x4.add
f32x4.extract_lane 0 ;; extract the first element
```

Here is an example to reduce add on f64x2:
```
get_local 0
get_local 0
v64x2.shuffle 1 0 ;; swap the lower part with the higher part of the vector
f64x2.add
f64x2.extract_lane 0 ;; extract the first element
```