Skip to content

Commit bacf8d3

Browse files
authored
Merge pull request #60 from data-apis/intro-and-history
And Introduction and History sections
2 parents b4fef78 + 3a64eb4 commit bacf8d3

File tree

4 files changed

+91
-21
lines changed

4 files changed

+91
-21
lines changed

spec/future_API_evolution.md

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _future-API-evolution:
2+
13
# Future API standard evolution
24

35
## Scope extensions

spec/purpose_and_scope.md

+85-21
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,86 @@
22

33
## Introduction
44

5+
Python users have a wealth of choice for libraries and frameworks for
6+
numerical computing, data science, machine learning, and deep learning. New
7+
frameworks pushing forward the state of the art in these fields are appearing
8+
every year. One unintended consequence of all this activity and creativity
9+
has been fragmentation in multidimensional array (a.k.a. tensor) libraries -
10+
which are the fundamental data structure for these fields. Choices include
11+
NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others.
12+
13+
The APIs of each of these libraries are largely similar, but with enough
14+
differences that it's quite difficult to write code that works with multiple
15+
(or all) of these libraries. This array API standard aims to address that
16+
issue, by specifying an API for the most common ways arrays are constructed
17+
and used.
18+
19+
Why not simply pick an existing API and bless that as the standard? In short,
20+
because there are often good reasons for the current inconsistencies between
21+
libraries. The most obvious candidate for that existing API is NumPy. However
22+
NumPy was not designed with non-CPU devices, graph-based libraries, or JIT
23+
compilers in mind. Other libraries often deviate from NumPy for good
24+
(necessary) reasons. Choices made in this API standard are often the same
25+
ones NumPy makes, or close to it, but are different where necessary to make
26+
sure all existing array libraries can adopt this API.
27+
528

629
### This API standard
730

31+
This document aims to standardize functionality that exists in most/all array
32+
libraries and either is commonly used or is needed for
33+
consistency/completeness. Usage is determined via analysis of downstream
34+
libraries, see :ref:`usage-data`. An example of consistency is: there are
35+
functional equivalents for all Python operators (including the rarely used
36+
ones).
37+
38+
Beyond usage and consistency, there's a set of use cases that inform the API
39+
design to ensure it's fit for a wide range of users and situations - see
40+
:ref:`use-cases`.
41+
42+
A question that may arise when reading this document is: _"what about
43+
functionality that's not present in this document?_ This:
44+
45+
- means that there is no guarantee the functionality is present in libraries
46+
adhering to the standard
47+
- does _not_ mean that that functionality is unimportant
48+
- may indicate that that functionality, if present in a particular array
49+
library, is unlikely to be present in all other libraries
850

9-
## History
51+
.. note::
1052

53+
This document is ready for wider community review, but still contains a
54+
number of TODOs, and is expected to change and evolve before a first
55+
official release. See :ref:`future-API-evolution` for proposed
56+
versioning.
57+
58+
59+
### History
60+
61+
The first library for numerical and scientific computing in Python was
62+
Numeric, developed in the mid-1990s. In the early 2000s a second, similar
63+
library, Numarray, was created. In 2005 NumPy was written, superceding both
64+
Numeric and Numarray and resolving the fragmentation at that time. For
65+
roughly a decade, NumPy was the only widely used array library. Over the past
66+
~5 years, mainly due to the emergence of new hardware and the rise of deep
67+
learning, many other libraries have appeared, leading to more severe
68+
fragmentation. Concepts and APIs in newer libraries were often inspired by
69+
(or copied from) those in older ones - and then changed or improved upon to
70+
fit new needs and use cases. Individual library authors discussed ideas,
71+
however there was never (before this array API standard) an serious attempt
72+
to coordinate between all libraries to avoid fragmentation and arrive at a
73+
common API standard.
74+
75+
The idea for this array API standard grew gradually out of many conversations
76+
between maintainers during 2019-2020. It quickly became clear that any
77+
attempt to write a new "reference library" to fix the current fragmentation
78+
was infeasible - unlike in 2005, there are now too many different use cases
79+
and too many stakeholders, and the speed of innovation is too high. In May
80+
2020 an initial group of maintainers was assembled in the [Consortium for
81+
Python Data API Standards](https://data-apis.org/) to start drafting a
82+
specification for an array API that could be adopted by each of the existing
83+
array and tensor libraries. That resulted in this document, describing that
84+
API.
1185

1286

1387
## Scope (includes out-of-scope / non-goals)
@@ -306,44 +380,34 @@ For the purposes of this specification, the following terms and definitions appl
306380

307381
<!-- NOTE: please keep terms in alphabetical order -->
308382

309-
### array
310-
383+
**array**:
311384
a (usually fixed-size) multidimensional container of items of the same type and size.
312385

313-
### axis
314-
386+
**axis**:
315387
an array dimension.
316388

317-
### broadcast
318-
389+
**broadcast**:
319390
automatic (implicit) expansion of array dimensions to be of equal sizes without copying array data for the purpose of making arrays with different shapes have compatible shapes for element-wise operations.
320391

321-
### compatible
322-
392+
**compatible**:
323393
two arrays whose dimensions are compatible (i.e., where the size of each dimension in one array is either equal to one or to the size of the corresponding dimension in a second array).
324394

325-
### element-wise
326-
395+
**element-wise**:
327396
an operation performed element-by-element, in which individual array elements are considered in isolation and independently of other elements within the same array.
328397

329-
### matrix
330-
398+
**matrix**:
331399
a two-dimensional array.
332400

333-
### rank
334-
401+
**rank**:
335402
number of array dimensions (not to be confused with the number of linearly independent columns of a matrix).
336403

337-
### shape
338-
404+
**shape**:
339405
a tuple of `N` non-negative integers that specify the sizes of each dimension and where `N` corresponds to the number of dimensions.
340406

341-
### singleton dimension
342-
407+
**singleton dimension**:
343408
a dimension whose size is one.
344409

345-
### vector
346-
410+
**vector**:
347411
a one-dimensional array.
348412

349413
* * *

spec/usage_data.md

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _usage-data:
2+
13
# Usage Data
24

35
> Summary of existing array API design and usage.

spec/use_cases.md

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _use-cases:
2+
13
# Use cases
24

35
Use cases inform the requirements for, and design choices made in, this array

0 commit comments

Comments
 (0)