|
2 | 2 |
|
3 | 3 | ## Introduction
|
4 | 4 |
|
| 5 | +Python users have a wealth of choice for libraries and frameworks for |
| 6 | +numerical computing, data science, machine learning, and deep learning. New |
| 7 | +frameworks pushing forward the state of the art in these fields are appearing |
| 8 | +every year. One unintended consequence of all this activity and creativity |
| 9 | +has been fragmentation in multidimensional array (a.k.a. tensor) libraries - |
| 10 | +which are the fundamental data structure for these fields. Choices include |
| 11 | +NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others. |
| 12 | + |
| 13 | +The APIs of each of these libraries are largely similar, but with enough |
| 14 | +differences that it's quite difficult to write code that works with multiple |
| 15 | +(or all) of these libraries. This array API standard aims to address that |
| 16 | +issue, by specifying an API for the most common ways arrays are constructed |
| 17 | +and used. |
| 18 | + |
| 19 | +Why not simply pick an existing API and bless that as the standard? In short, |
| 20 | +because there are often good reasons for the current inconsistencies between |
| 21 | +libraries. The most obvious candidate for that existing API is NumPy. However |
| 22 | +NumPy was not designed with non-CPU devices, graph-based libraries, or JIT |
| 23 | +compilers in mind. Other libraries often deviate from NumPy for good |
| 24 | +(necessary) reasons. Choices made in this API standard are often the same |
| 25 | +ones NumPy makes, or close to it, but are different where necessary to make |
| 26 | +sure all existing array libraries can adopt this API. |
| 27 | + |
5 | 28 |
|
6 | 29 | ### This API standard
|
7 | 30 |
|
| 31 | +This document aims to standardize functionality that exists in most/all array |
| 32 | +libraries and either is commonly used or is needed for |
| 33 | +consistency/completeness. Usage is determined via analysis of downstream |
| 34 | +libraries, see :ref:`usage-data`. An example of consistency is: there are |
| 35 | +functional equivalents for all Python operators (including the rarely used |
| 36 | +ones). |
| 37 | + |
| 38 | +Beyond usage and consistency, there's a set of use cases that inform the API |
| 39 | +design to ensure it's fit for a wide range of users and situations - see |
| 40 | +:ref:`use-cases`. |
| 41 | + |
| 42 | +A question that may arise when reading this document is: _"what about |
| 43 | +functionality that's not present in this document?_ This: |
| 44 | + |
| 45 | +- means that there is no guarantee the functionality is present in libraries |
| 46 | + adhering to the standard |
| 47 | +- does _not_ mean that that functionality is unimportant |
| 48 | +- may indicate that that functionality, if present in a particular array |
| 49 | + library, is unlikely to be present in all other libraries |
8 | 50 |
|
9 |
| -## History |
| 51 | +.. note:: |
10 | 52 |
|
| 53 | + This document is ready for wider community review, but still contains a |
| 54 | + number of TODOs, and is expected to change and evolve before a first |
| 55 | + official release. See :ref:`future-API-evolution` for proposed |
| 56 | + versioning. |
| 57 | + |
| 58 | + |
| 59 | +### History |
| 60 | + |
| 61 | +The first library for numerical and scientific computing in Python was |
| 62 | +Numeric, developed in the mid-1990s. In the early 2000s a second, similar |
| 63 | +library, Numarray, was created. In 2005 NumPy was written, superceding both |
| 64 | +Numeric and Numarray and resolving the fragmentation at that time. For |
| 65 | +roughly a decade, NumPy was the only widely used array library. Over the past |
| 66 | +~5 years, mainly due to the emergence of new hardware and the rise of deep |
| 67 | +learning, many other libraries have appeared, leading to more severe |
| 68 | +fragmentation. Concepts and APIs in newer libraries were often inspired by |
| 69 | +(or copied from) those in older ones - and then changed or improved upon to |
| 70 | +fit new needs and use cases. Individual library authors discussed ideas, |
| 71 | +however there was never (before this array API standard) an serious attempt |
| 72 | +to coordinate between all libraries to avoid fragmentation and arrive at a |
| 73 | +common API standard. |
| 74 | + |
| 75 | +The idea for this array API standard grew gradually out of many conversations |
| 76 | +between maintainers during 2019-2020. It quickly became clear that any |
| 77 | +attempt to write a new "reference library" to fix the current fragmentation |
| 78 | +was infeasible - unlike in 2005, there are now too many different use cases |
| 79 | +and too many stakeholders, and the speed of innovation is too high. In May |
| 80 | +2020 an initial group of maintainers was assembled in the [Consortium for |
| 81 | +Python Data API Standards](https://data-apis.org/) to start drafting a |
| 82 | +specification for an array API that could be adopted by each of the existing |
| 83 | +array and tensor libraries. That resulted in this document, describing that |
| 84 | +API. |
11 | 85 |
|
12 | 86 |
|
13 | 87 | ## Scope (includes out-of-scope / non-goals)
|
@@ -306,44 +380,34 @@ For the purposes of this specification, the following terms and definitions appl
|
306 | 380 |
|
307 | 381 | <!-- NOTE: please keep terms in alphabetical order -->
|
308 | 382 |
|
309 |
| -### array |
310 |
| - |
| 383 | +**array**: |
311 | 384 | a (usually fixed-size) multidimensional container of items of the same type and size.
|
312 | 385 |
|
313 |
| -### axis |
314 |
| - |
| 386 | +**axis**: |
315 | 387 | an array dimension.
|
316 | 388 |
|
317 |
| -### broadcast |
318 |
| - |
| 389 | +**broadcast**: |
319 | 390 | automatic (implicit) expansion of array dimensions to be of equal sizes without copying array data for the purpose of making arrays with different shapes have compatible shapes for element-wise operations.
|
320 | 391 |
|
321 |
| -### compatible |
322 |
| - |
| 392 | +**compatible**: |
323 | 393 | two arrays whose dimensions are compatible (i.e., where the size of each dimension in one array is either equal to one or to the size of the corresponding dimension in a second array).
|
324 | 394 |
|
325 |
| -### element-wise |
326 |
| - |
| 395 | +**element-wise**: |
327 | 396 | an operation performed element-by-element, in which individual array elements are considered in isolation and independently of other elements within the same array.
|
328 | 397 |
|
329 |
| -### matrix |
330 |
| - |
| 398 | +**matrix**: |
331 | 399 | a two-dimensional array.
|
332 | 400 |
|
333 |
| -### rank |
334 |
| - |
| 401 | +**rank**: |
335 | 402 | number of array dimensions (not to be confused with the number of linearly independent columns of a matrix).
|
336 | 403 |
|
337 |
| -### shape |
338 |
| - |
| 404 | +**shape**: |
339 | 405 | a tuple of `N` non-negative integers that specify the sizes of each dimension and where `N` corresponds to the number of dimensions.
|
340 | 406 |
|
341 |
| -### singleton dimension |
342 |
| - |
| 407 | +**singleton dimension**: |
343 | 408 | a dimension whose size is one.
|
344 | 409 |
|
345 |
| -### vector |
346 |
| - |
| 410 | +**vector**: |
347 | 411 | a one-dimensional array.
|
348 | 412 |
|
349 | 413 | * * *
|
|
0 commit comments