RFC: add support for repeating each element of an array

This RFC proposes adding support to the array API specification for repeating each element of an array.

## Overview

Based on array comparison [data](https://github.com/data-apis/array-api-comparison/blob/e38b7d45c0185c389ba63e01a7473844ce3c68d7/signatures/manipulation/repeat.md), the API is available in most array libraries. The main exception is PyTorch which deviates in its naming convention (`repeat_interleave` vs NumPy et al's `repeat`).

## Prior art

- NumPy: <https://numpy.org/doc/stable/reference/generated/numpy.repeat.html>
- PyTorch: <https://pytorch.org/docs/stable/generated/torch.repeat_interleave.html>
- TensorFlow: <https://www.tensorflow.org/api_docs/python/tf/repeat>
- JAX: <https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.repeat.html>
- CuPy: <https://docs.cupy.dev/en/stable/reference/generated/cupy.repeat.html>
- Dask: <https://docs.dask.org/en/stable/generated/dask.array.repeat.html#dask.array.repeat>

## Proposal

```python
def repeat(x: array, repeats: Union[int, Sequence[int], array], /, *, axis: Optional[int] = None)
```

- **repeats**: the number of repetitions for each element.

  If `axis` is not `None`,

  - if `repeats` is an array, `repeats.shape` must broadcast to `x.shape[axis]`. 
  - if `repeats` is a sequence of ints, `len(repeats)` must broadcast to `x.shape[axis]`.
  - if `repeats` is an integer, `repeats` must be broadcasted to match the size of a specified `axis`.
 
  If `axis` is `None`,

  - if `repeats` is an array, `repeats.shape` must broadcast to `prod(x.shape)`.
  - if `repeats` is a sequence of ints, `len(repeats)` must broadcast to `prod(x.shape)`.
  - if `repeats` is an integer, `repeats` must be broadcasted to match the size of the flattened array.

- **axis**: specifies the axis along which to repeat values. If `None`, use a flattened input array and return a flat output array.

## Questions

- Both PyTorch and JAX support a kwarg for specifying the output size in order to avoid stream synchronization (PyTorch) and to allow compilation (JAX). Without such kwarg support, is this API viable? And what are the reasons for needing this kwarg when other array libraries (TensorFlow) omit such a kwarg?
- When flattening the input array, flatten in row-major order? (precedent: `nonzero`)
- Is PyTorch okay adding a `repeat` function in its main namespace, given the divergence in behavior for `torch.Tensor.repeat`, which behaves similar to `np.tile`?
- CuPy only allows `int`, `List`, and `Tuple` for repeats, not an array. PyTorch may prefer a list of `ints` (see <https://github.com/pytorch/pytorch/issues/108968>).

## Related

- Adding tuple argument support to `numpy.repeat` to avoid repeated invocations: <https://github.com/numpy/numpy/issues/21435> and <https://github.com/numpy/numpy/pull/23937>.
- Mention of xarray's need for `repeat`: <https://github.com/data-apis/array-api/issues/187#issuecomment-1553615779>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: add support for repeating each element of an array #654

Overview

Prior art

Proposal

Questions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: add support for repeating each element of an array #654

Description

Overview

Prior art

Proposal

Questions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions