zarr-developers
diff --git a/‎docs/api/storage.rst
Lines changed: 4 additions & 5 deletions b/‎docs/api/storage.rst
Lines changed: 4 additions & 5 deletions
diff --git a/‎docs/index.rst
Lines changed: 12 additions & 21 deletions b/‎docs/index.rst
Lines changed: 12 additions & 21 deletions
diff --git a/‎docs/spec/v1.rst
Lines changed: 102 additions & 89 deletions b/‎docs/spec/v1.rst
Lines changed: 102 additions & 89 deletions
@@ -2,11 +2,10 @@ Storage (``zarr.storage``)
 ==========================
 .. module:: zarr.storage
 
-This module contains a single :class:`DirectoryStore` class providing a
-``MutableMapping`` interface to a directory on the file system.
-
-Note that any object implementing the ``MutableMapping`` interface can be used
-as a Zarr array store.
+This module contains a single :class:`DirectoryStore` class providing
+a ``MutableMapping`` interface to a directory on the file
+system. However, note that any object implementing the
+``MutableMapping`` interface can be used as a Zarr array store.
 
 .. autofunction:: init_store
 
 
@@ -1,7 +1,5 @@
 .. zarr documentation master file, created by
    sphinx-quickstart on Mon May  2 21:40:09 2016.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
 
 Zarr
 ====
@@ -14,26 +12,16 @@ chunked, compressed, N-dimensional arrays.
 * Download: https://pypi.python.org/pypi/zarr
 * Release notes: https://github.com/alimanfoo/zarr/releases
 
-Motivation
+Highlights
 ----------
 
-Zarr is motivated by the desire to work interactively with
-multi-dimensional scientific datasets too large to fit into memory on
-commodity desktop or laptop computers. Interactive data analysis
-requires fast array storage, because an interactive session may
-involve creation and manipulation of many intermediate data
-structures. Faster storage provides more freedom to explore a rich and
-complex dataset in a variety of different ways. The Blosc compression
-library provides extremely fast multi-threaded compression and
-decompression, and so a primary motivation for Zarr was to bring
-together Blosc with multi-dimensional arrays in a convenient way.
-
-A second motivation is to provide array storage that is convenient and
-well-suited to use in parallel computations. This means supporting
-concurrent data access from multiple threads or processes, without
-unnecessary locking or exclusion, to maximise the possibility for work
-to be carried out in parallel.
-
+* Create N-dimensional arrays with any NumPy dtype.
+* Chunk arrays along any dimension.
+* Compress chunks using the fast Blosc_ meta-compressor or alternatively using zlib, BZ2 or LZMA.
+* Store arrays in memory, on disk, inside a Zip file, on S3, ... pretty much anywhere you like.
+* Read an array concurrently from multiple threads or processes.
+* Write to an array concurrently from multiple threads or processes.
+    
 Status
 ------
 
@@ -79,7 +67,8 @@ Acknowledgments
 Zarr bundles the `c-blosc <https://github.com/Blosc/c-blosc>`_
 library and uses it as the default compressor.
 
-Zarr is inspired by and borrows code from `bcolz <http://bcolz.blosc.org/>`_.
+Zarr is inspired by `HDF5 <https://www.hdfgroup.org/HDF5/>`_, `h5py
+<http://www.h5py.org/>`_ and `bcolz <http://bcolz.blosc.org/>`_.
 
 Development of this package is supported by the
 `MRC Centre for Genomics and Global Health <http://www.cggh.org>`_.
@@ -90,3 +79,5 @@ Indices and tables
 * :ref:`genindex`
 * :ref:`modindex`
 * :ref:`search`
+
+.. _Blosc: http://www.blosc.org/
@@ -1,38 +1,42 @@
 Zarr storage specification version 1
 ====================================
 
-This document provides a technical specification of the format used for
-storing a Zarr array. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
-"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
-this document are to be interpreted as described in
-`RFC 2119 <https://www.ietf.org/rfc/rfc2119.txt>`_.
+This document provides a technical specification of the format used
+for storing a Zarr array. The key words "MUST", "MUST NOT",
+"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
+"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in `RFC 2119
+<https://www.ietf.org/rfc/rfc2119.txt>`_.
 
 Storage
 -------
 
-A Zarr array can be stored in any storage system that provides a key/value
-interface, where a key is an ASCII string and a value is an arbitrary
-sequence of bytes, and the supported operations are read (get the sequence
-of bytes associated with a given key), write (set the sequence of bytes
-associated with a given key) and delete (remove a key/value pair).
+A Zarr array can be stored in any storage system that provides a
+key/value interface, where a key is an ASCII string and a value is an
+arbitrary sequence of bytes, and the supported operations are read
+(get the sequence of bytes associated with a given key), write (set
+the sequence of bytes associated with a given key) and delete (remove
+a key/value pair).
 
-For example, a directory in a file system can provide this interface, where
-keys are file names, values are file contents, and files can be read, written
-or deleted. Similarly, an S3 bucket can provide this interface, where
-keys are resource names, values are resource contents, and resources can be
-read, written or deleted via HTTP.
+For example, a directory in a file system can provide this interface,
+where keys are file names, values are file contents, and files can be
+read, written or deleted. Equally, an S3 bucket can provide this
+interface, where keys are resource names, values are resource
+contents, and resources can be read, written or deleted via HTTP.
 
-Below an "array store" refers to any system implementing this interface.
+Below an "array store" refers to any system implementing this
+interface.
 
 Metadata
 --------
 
-Each array requires essential configuration metadata to be stored, enabling
-correct interpretation of the stored data. This metadata is encoded using
-JSON and stored as the value of the 'meta' key within an array store.
+Each array requires essential configuration metadata to be stored,
+enabling correct interpretation of the stored data. This metadata is
+encoded using JSON and stored as the value of the 'meta' key within an
+array store.
 
-The metadata resource is a JSON object. The following keys MUST be present
-within the object:
+The metadata resource is a JSON object. The following keys MUST be
+present within the object:
 
 zarr_format
     An integer defining the version of the storage specification to which the
@@ -59,15 +63,15 @@ order
     array. 'C' means row-major order, i.e., the last dimension varies fastest;
     'F' means column-major order, i.e., the first dimension varies fastest.
 
-Other keys MAY be present within the metadata object however they MUST NOT
-alter the interpretation of the required fields defined above.
+Other keys MAY be present within the metadata object however they MUST
+NOT alter the interpretation of the required fields defined above.
 
-For example, the JSON object below defines a 2-dimensional array of 64-bit
-little-endian floating point numbers with 10000 rows and 10000 columns,
-divided into chunks of 1000 rows and 1000 columns (so there will be 100
-chunks in total arranged in a 10 by 10 grid). Within each chunk the data
-are laid out in C contiguous order, and each chunk is compressed using the
-Blosc compression library::
+For example, the JSON object below defines a 2-dimensional array of
+64-bit little-endian floating point numbers with 10000 rows and 10000
+columns, divided into chunks of 1000 rows and 1000 columns (so there
+will be 100 chunks in total arranged in a 10 by 10 grid). Within each
+chunk the data are laid out in C contiguous order, and each chunk is
+compressed using the Blosc compression library::
 
     {
         "chunks": [
@@ -94,33 +98,36 @@ Data type encoding
 ~~~~~~~~~~~~~~~~~~
 
 Simple data types are encoded within the array metadata resource as a
-string, following the `NumPy array protocol type string (typestr) format
+string, following the `NumPy array protocol type string (typestr)
+format
 <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html>`_. The
-format consists of 3 parts: a character describing the byteorder of the
-data (``<``: little-endian, ``>``: big-endian, ``|``: not-relevant), a
-character code giving the basic type of the array, and an integer providing
-the number of bytes the type uses. The byte order MUST be specified. E.g.,
-``"<f8"``, ``">i4"``, ``"|b1"`` and ``"|S12"`` are valid data types.
-
-Structure data types (i.e., with multiple named fields) are encoded as a
-list of two-element lists, following `NumPy array protocol type descriptions
-(descr) <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#>`_.
-For example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b", "|u1"]]``
-defines a data type composed of three single-byte unsigned integers labelled
-'r', 'g' and 'b'.
+format consists of 3 parts: a character describing the byteorder of
+the data (``<``: little-endian, ``>``: big-endian, ``|``:
+not-relevant), a character code giving the basic type of the array,
+and an integer providing the number of bytes the type uses. The byte
+order MUST be specified. E.g., ``"<f8"``, ``">i4"``, ``"|b1"`` and
+``"|S12"`` are valid data types.
+
+Structure data types (i.e., with multiple named fields) are encoded as
+a list of two-element lists, following `NumPy array protocol type
+descriptions (descr)
+<http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#>`_.
+For example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b",
+"|u1"]]`` defines a data type composed of three single-byte unsigned
+integers labelled 'r', 'g' and 'b'.
 
 Chunks
 ------
 
-Each chunk of the array is compressed by passing the raw bytes for the chunk
-through the primary compression library to obtain a new sequence of bytes
-comprising the compressed chunk data. No header is added to the compressed
-bytes or any other modification made. The internal structure of the
-compressed bytes will depend on which primary compressor was used. For
-example, the
-`Blosc compressor <https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst>`_
-produces a sequence of bytes that begins with a 16-byte header followed by
-compressed data.
+Each chunk of the array is compressed by passing the raw bytes for the
+chunk through the primary compression library to obtain a new sequence
+of bytes comprising the compressed chunk data. No header is added to
+the compressed bytes or any other modification made. The internal
+structure of the compressed bytes will depend on which primary
+compressor was used. For example, the `Blosc compressor
+<https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst>`_
+produces a sequence of bytes that begins with a 16-byte header
+followed by compressed data.
 
 The compressed sequence of bytes for each chunk is stored under a key
 formed from the index of the chunk within the grid of chunks
@@ -133,28 +140,30 @@ data for rows 0-1000 and columns 0-1000 and is stored under the key
 '0.0'; the chunk with indices (2, 4) provides data for rows 2000-3000
 and columns 4000-5000 and is stored under the key '2.4'; etc.
 
-There is no need for all chunks to be present within an array store. If a
-chunk is not present then it is considered to be in an uninitialized state.
-An unitialized chunk MUST be treated as if it was uniformly filled with the
-value of the 'fill_value' field in the array metadata. If the 'fill_value'
-field is ``null`` then the contents of the chunk are undefined.
+There is no need for all chunks to be present within an array
+store. If a chunk is not present then it is considered to be in an
+uninitialized state.  An unitialized chunk MUST be treated as if it
+was uniformly filled with the value of the 'fill_value' field in the
+array metadata. If the 'fill_value' field is ``null`` then the
+contents of the chunk are undefined.
 
-Note that all chunks in array have the same shape. If the length of any
-array dimension is not exactly divisible by the length of the corresponding
-chunk dimension then some chunks will overhang the edge of the array. The
-contents of any chunk region falling outside the array are undefined.
+Note that all chunks in array have the same shape. If the length of
+any array dimension is not exactly divisible by the length of the
+corresponding chunk dimension then some chunks will overhang the edge
+of the array. The contents of any chunk region falling outside the
+array are undefined.
 
 Attributes
 ----------
 
-Each array can also be associated with custom attributes, which are simple
-key/value items with application-specific meaning. Custom attributes are
-encoded as a JSON object and stored under the 'attrs' key within an array
-store. Even if the attributes are empty, the 'attrs' key MUST be present
-within an array store.
+Each array can also be associated with custom attributes, which are
+simple key/value items with application-specific meaning. Custom
+attributes are encoded as a JSON object and stored under the 'attrs'
+key within an array store. Even if the attributes are empty, the
+'attrs' key MUST be present within an array store.
 
-For example, the JSON object below encodes three attributes named 'foo', 'bar'
-and 'baz'::
+For example, the JSON object below encodes three attributes named
+'foo', 'bar' and 'baz'::
 
     {
         "foo": 42,
@@ -165,13 +174,16 @@ and 'baz'::
 Example
 -------
 
-Below is an example of storing a Zarr array within a directory called
-'example.zarr' on the local file system::
+Below is an example of storing a Zarr array, using a directory on the
+local file system as storage.
+
+Initialize the store::
 
     >>> import zarr
-    >>> z = zarr.open('example.zarr', mode='w', shape=(20, 20),
-    ...               chunks=(10, 10), dtype='i4', fill_value=42,
-    ...               compression='zlib', compression_opts=1)
+    >>> store = zarr.DirectoryStore('example.zarr')
+    >>> zarr.init_store(store, shape=(20, 20), chunks=(10, 10),
+    ...                 dtype='i4', fill_value=42, compression='zlib',
+    ...                 compression_opts=1, overwrite=True)
 
 No chunks are initialized yet, so only the 'meta' and 'attrs' keys are
 present::
@@ -205,25 +217,9 @@ Inspect the array attributes::
     >>> print(open('example.zarr/attrs').read())
     {}
 
-Modify the array attributes::
-
-    >>> z.attrs['foo'] = 42
-    >>> z.attrs['bar'] = 'apples'
-    >>> z.attrs['baz'] = [1, 2, 3, 4]
-    >>> print(open('example.zarr/attrs').read())
-    {
-        "bar": "apples",
-        "baz": [
-            1,
-            2,
-            3,
-            4
-        ],
-        "foo": 42
-    }
-
 Set some data::
 
+    >>> z = zarr.Array(store)
     >>> z[0:10, 0:10] = 1
     >>> sorted(os.listdir('example.zarr'))
     ['0.0', 'attrs', 'meta']
@@ -247,3 +243,20 @@ Manually decompress a single chunk for illustration::
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
+
+Modify the array attributes::
+
+    >>> z.attrs['foo'] = 42
+    >>> z.attrs['bar'] = 'apples'
+    >>> z.attrs['baz'] = [1, 2, 3, 4]
+    >>> print(open('example.zarr/attrs').read())
+    {
+        "bar": "apples",
+        "baz": [
+            1,
+            2,
+            3,
+            4
+        ],
+        "foo": 42
+    }