Skip to content

And Introduction and History sections #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions spec/future_API_evolution.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _future-API-evolution:

# Future API standard evolution

## Scope extensions
Expand Down
106 changes: 85 additions & 21 deletions spec/purpose_and_scope.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,86 @@

## Introduction

Python users have a wealth of choice for libraries and frameworks for
numerical computing, data science, machine learning, and deep learning. New
frameworks pushing forward the state of the art in these fields are appearing
every year. One unintended consequence of all this activity and creativity
has been fragmentation in multidimensional array (a.k.a. tensor) libraries -
which are the fundamental data structure for these fields. Choices include
NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others.

The APIs of each of these libraries are largely similar, but with enough
differences that it's quite difficult to write code that works with multiple
(or all) of these libraries. This array API standard aims to address that
issue, by specifying an API for the most common ways arrays are constructed
and used.

Why not simply pick an existing API and bless that as the standard? In short,
because there are often good reasons for the current inconsistencies between
libraries. The most obvious candidate for that existing API is NumPy. However
NumPy was not designed with non-CPU devices, graph-based libraries, or JIT
compilers in mind. Other libraries often deviate from NumPy for good
(necessary) reasons. Choices made in this API standard are often the same
ones NumPy makes, or close to it, but are different where necessary to make
sure all existing array libraries can adopt this API.


### This API standard

This document aims to standardize functionality that exists in most/all array
libraries and either is commonly used or is needed for
consistency/completeness. Usage is determined via analysis of downstream
libraries, see :ref:`usage-data`. An example of consistency is: there are
functional equivalents for all Python operators (including the rarely used
ones).

Beyond usage and consistency, there's a set of use cases that inform the API
design to ensure it's fit for a wide range of users and situations - see
:ref:`use-cases`.

A question that may arise when reading this document is: _"what about
functionality that's not present in this document?_ This:

- means that there is no guarantee the functionality is present in libraries
adhering to the standard
- does _not_ mean that that functionality is unimportant
- may indicate that that functionality, if present in a particular array
library, is unlikely to be present in all other libraries

## History
.. note::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this doc built? does the tool support rst syntax in markdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant to, but this is the one thing that didn't work. The doc is built by installing the reqs in requirements.txt and then make html in the spec/ dir.

I bet on recommonmark back when I set this up, and should have gone all-reST or used Myst instead. My plan is to do a wholesale conversion to reST as the last step after all PRs we want to include in the first release are merged. Doing a conversion with a bunch of open PRs is a pain, so I left it till last.

Then this note will look like https://bashtage.github.io/sphinx-material/additional_samples.html?highlight=admonition#admonitions


This document is ready for wider community review, but still contains a
number of TODOs, and is expected to change and evolve before a first
official release. See :ref:`future-API-evolution` for proposed
versioning.


### History

The first library for numerical and scientific computing in Python was
Numeric, developed in the mid-1990s. In the early 2000s a second, similar
library, Numarray, was created. In 2005 NumPy was written, superceding both
Numeric and Numarray and resolving the fragmentation at that time. For
roughly a decade, NumPy was the only widely used array library. Over the past
~5 years, mainly due to the emergence of new hardware and the rise of deep
learning, many other libraries have appeared, leading to more severe
fragmentation. Concepts and APIs in newer libraries were often inspired by
(or copied from) those in older ones - and then changed or improved upon to
fit new needs and use cases. Individual library authors discussed ideas,
however there was never (before this array API standard) an serious attempt
to coordinate between all libraries to avoid fragmentation and arrive at a
common API standard.

The idea for this array API standard grew gradually out of many conversations
between maintainers during 2019-2020. It quickly became clear that any
attempt to write a new "reference library" to fix the current fragmentation
was infeasible - unlike in 2005, there are now too many different use cases
and too many stakeholders, and the speed of innovation is too high. In May
2020 an initial group of maintainers was assembled in the [Consortium for
Python Data API Standards](https://data-apis.org/) to start drafting a
specification for an array API that could be adopted by each of the existing
array and tensor libraries. That resulted in this document, describing that
API.


## Scope (includes out-of-scope / non-goals)
Expand Down Expand Up @@ -306,44 +380,34 @@ For the purposes of this specification, the following terms and definitions appl

<!-- NOTE: please keep terms in alphabetical order -->

### array

**array**:
a (usually fixed-size) multidimensional container of items of the same type and size.

### axis

**axis**:
an array dimension.

### broadcast

**broadcast**:
automatic (implicit) expansion of array dimensions to be of equal sizes without copying array data for the purpose of making arrays with different shapes have compatible shapes for element-wise operations.

### compatible

**compatible**:
two arrays whose dimensions are compatible (i.e., where the size of each dimension in one array is either equal to one or to the size of the corresponding dimension in a second array).

### element-wise

**element-wise**:
an operation performed element-by-element, in which individual array elements are considered in isolation and independently of other elements within the same array.

### matrix

**matrix**:
a two-dimensional array.

### rank

**rank**:
number of array dimensions (not to be confused with the number of linearly independent columns of a matrix).

### shape

**shape**:
a tuple of `N` non-negative integers that specify the sizes of each dimension and where `N` corresponds to the number of dimensions.

### singleton dimension

**singleton dimension**:
a dimension whose size is one.

### vector

**vector**:
a one-dimensional array.

* * *
Expand Down
2 changes: 2 additions & 0 deletions spec/usage_data.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _usage-data:

# Usage Data

> Summary of existing array API design and usage.
Expand Down
2 changes: 2 additions & 0 deletions spec/use_cases.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _use-cases:

# Use cases

Use cases inform the requirements for, and design choices made in, this array
Expand Down