1
1
Zarr storage specification version 1
2
2
====================================
3
3
4
- This document provides a technical specification of the format used for
5
- storing a Zarr array. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
6
- "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
7
- this document are to be interpreted as described in
8
- `RFC 2119 <https://www.ietf.org/rfc/rfc2119.txt >`_.
4
+ This document provides a technical specification of the format used
5
+ for storing a Zarr array. The key words "MUST", "MUST NOT",
6
+ "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
7
+ "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
8
+ interpreted as described in `RFC 2119
9
+ <https://www.ietf.org/rfc/rfc2119.txt> `_.
9
10
10
11
Storage
11
12
-------
12
13
13
- A Zarr array can be stored in any storage system that provides a key/value
14
- interface, where a key is an ASCII string and a value is an arbitrary
15
- sequence of bytes, and the supported operations are read (get the sequence
16
- of bytes associated with a given key), write (set the sequence of bytes
17
- associated with a given key) and delete (remove a key/value pair).
14
+ A Zarr array can be stored in any storage system that provides a
15
+ key/value interface, where a key is an ASCII string and a value is an
16
+ arbitrary sequence of bytes, and the supported operations are read
17
+ (get the sequence of bytes associated with a given key), write (set
18
+ the sequence of bytes associated with a given key) and delete (remove
19
+ a key/value pair).
18
20
19
- For example, a directory in a file system can provide this interface, where
20
- keys are file names, values are file contents, and files can be read, written
21
- or deleted. Similarly , an S3 bucket can provide this interface, where
22
- keys are resource names, values are resource contents, and resources can be
23
- read, written or deleted via HTTP.
21
+ For example, a directory in a file system can provide this interface,
22
+ where keys are file names, values are file contents, and files can be
23
+ read, written or deleted. Equally , an S3 bucket can provide this
24
+ interface, where keys are resource names, values are resource
25
+ contents, and resources can be read, written or deleted via HTTP.
24
26
25
- Below an "array store" refers to any system implementing this interface.
27
+ Below an "array store" refers to any system implementing this
28
+ interface.
26
29
27
30
Metadata
28
31
--------
29
32
30
- Each array requires essential configuration metadata to be stored, enabling
31
- correct interpretation of the stored data. This metadata is encoded using
32
- JSON and stored as the value of the 'meta' key within an array store.
33
+ Each array requires essential configuration metadata to be stored,
34
+ enabling correct interpretation of the stored data. This metadata is
35
+ encoded using JSON and stored as the value of the 'meta' key within an
36
+ array store.
33
37
34
- The metadata resource is a JSON object. The following keys MUST be present
35
- within the object:
38
+ The metadata resource is a JSON object. The following keys MUST be
39
+ present within the object:
36
40
37
41
zarr_format
38
42
An integer defining the version of the storage specification to which the
@@ -59,15 +63,15 @@ order
59
63
array. 'C' means row-major order, i.e., the last dimension varies fastest;
60
64
'F' means column-major order, i.e., the first dimension varies fastest.
61
65
62
- Other keys MAY be present within the metadata object however they MUST NOT
63
- alter the interpretation of the required fields defined above.
66
+ Other keys MAY be present within the metadata object however they MUST
67
+ NOT alter the interpretation of the required fields defined above.
64
68
65
- For example, the JSON object below defines a 2-dimensional array of 64-bit
66
- little-endian floating point numbers with 10000 rows and 10000 columns,
67
- divided into chunks of 1000 rows and 1000 columns (so there will be 100
68
- chunks in total arranged in a 10 by 10 grid). Within each chunk the data
69
- are laid out in C contiguous order, and each chunk is compressed using the
70
- Blosc compression library::
69
+ For example, the JSON object below defines a 2-dimensional array of
70
+ 64-bit little-endian floating point numbers with 10000 rows and 10000
71
+ columns, divided into chunks of 1000 rows and 1000 columns (so there
72
+ will be 100 chunks in total arranged in a 10 by 10 grid). Within each
73
+ chunk the data are laid out in C contiguous order, and each chunk is
74
+ compressed using the Blosc compression library::
71
75
72
76
{
73
77
"chunks": [
@@ -94,33 +98,36 @@ Data type encoding
94
98
~~~~~~~~~~~~~~~~~~
95
99
96
100
Simple data types are encoded within the array metadata resource as a
97
- string, following the `NumPy array protocol type string (typestr) format
101
+ string, following the `NumPy array protocol type string (typestr)
102
+ format
98
103
<http://docs.scipy.org/doc/numpy/reference/arrays.interface.html> `_. The
99
- format consists of 3 parts: a character describing the byteorder of the
100
- data (``< ``: little-endian, ``> ``: big-endian, ``| ``: not-relevant), a
101
- character code giving the basic type of the array, and an integer providing
102
- the number of bytes the type uses. The byte order MUST be specified. E.g.,
103
- ``"<f8" ``, ``">i4" ``, ``"|b1" `` and ``"|S12" `` are valid data types.
104
-
105
- Structure data types (i.e., with multiple named fields) are encoded as a
106
- list of two-element lists, following `NumPy array protocol type descriptions
107
- (descr) <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#> `_.
108
- For example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b", "|u1"]] ``
109
- defines a data type composed of three single-byte unsigned integers labelled
110
- 'r', 'g' and 'b'.
104
+ format consists of 3 parts: a character describing the byteorder of
105
+ the data (``< ``: little-endian, ``> ``: big-endian, ``| ``:
106
+ not-relevant), a character code giving the basic type of the array,
107
+ and an integer providing the number of bytes the type uses. The byte
108
+ order MUST be specified. E.g., ``"<f8" ``, ``">i4" ``, ``"|b1" `` and
109
+ ``"|S12" `` are valid data types.
110
+
111
+ Structure data types (i.e., with multiple named fields) are encoded as
112
+ a list of two-element lists, following `NumPy array protocol type
113
+ descriptions (descr)
114
+ <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#> `_.
115
+ For example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b",
116
+ "|u1"]] `` defines a data type composed of three single-byte unsigned
117
+ integers labelled 'r', 'g' and 'b'.
111
118
112
119
Chunks
113
120
------
114
121
115
- Each chunk of the array is compressed by passing the raw bytes for the chunk
116
- through the primary compression library to obtain a new sequence of bytes
117
- comprising the compressed chunk data. No header is added to the compressed
118
- bytes or any other modification made. The internal structure of the
119
- compressed bytes will depend on which primary compressor was used. For
120
- example, the
121
- ` Blosc compressor <https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst >`_
122
- produces a sequence of bytes that begins with a 16-byte header followed by
123
- compressed data.
122
+ Each chunk of the array is compressed by passing the raw bytes for the
123
+ chunk through the primary compression library to obtain a new sequence
124
+ of bytes comprising the compressed chunk data. No header is added to
125
+ the compressed bytes or any other modification made. The internal
126
+ structure of the compressed bytes will depend on which primary
127
+ compressor was used. For example, the ` Blosc compressor
128
+ <https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst> `_
129
+ produces a sequence of bytes that begins with a 16-byte header
130
+ followed by compressed data.
124
131
125
132
The compressed sequence of bytes for each chunk is stored under a key
126
133
formed from the index of the chunk within the grid of chunks
@@ -133,28 +140,30 @@ data for rows 0-1000 and columns 0-1000 and is stored under the key
133
140
'0.0'; the chunk with indices (2, 4) provides data for rows 2000-3000
134
141
and columns 4000-5000 and is stored under the key '2.4'; etc.
135
142
136
- There is no need for all chunks to be present within an array store. If a
137
- chunk is not present then it is considered to be in an uninitialized state.
138
- An unitialized chunk MUST be treated as if it was uniformly filled with the
139
- value of the 'fill_value' field in the array metadata. If the 'fill_value'
140
- field is ``null `` then the contents of the chunk are undefined.
143
+ There is no need for all chunks to be present within an array
144
+ store. If a chunk is not present then it is considered to be in an
145
+ uninitialized state. An unitialized chunk MUST be treated as if it
146
+ was uniformly filled with the value of the 'fill_value' field in the
147
+ array metadata. If the 'fill_value' field is ``null `` then the
148
+ contents of the chunk are undefined.
141
149
142
- Note that all chunks in array have the same shape. If the length of any
143
- array dimension is not exactly divisible by the length of the corresponding
144
- chunk dimension then some chunks will overhang the edge of the array. The
145
- contents of any chunk region falling outside the array are undefined.
150
+ Note that all chunks in array have the same shape. If the length of
151
+ any array dimension is not exactly divisible by the length of the
152
+ corresponding chunk dimension then some chunks will overhang the edge
153
+ of the array. The contents of any chunk region falling outside the
154
+ array are undefined.
146
155
147
156
Attributes
148
157
----------
149
158
150
- Each array can also be associated with custom attributes, which are simple
151
- key/value items with application-specific meaning. Custom attributes are
152
- encoded as a JSON object and stored under the 'attrs' key within an array
153
- store. Even if the attributes are empty, the 'attrs' key MUST be present
154
- within an array store.
159
+ Each array can also be associated with custom attributes, which are
160
+ simple key/value items with application-specific meaning. Custom
161
+ attributes are encoded as a JSON object and stored under the 'attrs'
162
+ key within an array store. Even if the attributes are empty, the
163
+ 'attrs' key MUST be present within an array store.
155
164
156
- For example, the JSON object below encodes three attributes named 'foo', 'bar'
157
- and 'baz'::
165
+ For example, the JSON object below encodes three attributes named
166
+ 'foo', 'bar' and 'baz'::
158
167
159
168
{
160
169
"foo": 42,
@@ -165,13 +174,16 @@ and 'baz'::
165
174
Example
166
175
-------
167
176
168
- Below is an example of storing a Zarr array within a directory called
169
- 'example.zarr' on the local file system::
177
+ Below is an example of storing a Zarr array, using a directory on the
178
+ local file system as storage.
179
+
180
+ Initialize the store::
170
181
171
182
>>> import zarr
172
- >>> z = zarr.open('example.zarr', mode='w', shape=(20, 20),
173
- ... chunks=(10, 10), dtype='i4', fill_value=42,
174
- ... compression='zlib', compression_opts=1)
183
+ >>> store = zarr.DirectoryStore('example.zarr')
184
+ >>> zarr.init_store(store, shape=(20, 20), chunks=(10, 10),
185
+ ... dtype='i4', fill_value=42, compression='zlib',
186
+ ... compression_opts=1, overwrite=True)
175
187
176
188
No chunks are initialized yet, so only the 'meta' and 'attrs' keys are
177
189
present::
@@ -205,25 +217,9 @@ Inspect the array attributes::
205
217
>>> print(open('example.zarr/attrs').read())
206
218
{}
207
219
208
- Modify the array attributes::
209
-
210
- >>> z.attrs['foo'] = 42
211
- >>> z.attrs['bar'] = 'apples'
212
- >>> z.attrs['baz'] = [1, 2, 3, 4]
213
- >>> print(open('example.zarr/attrs').read())
214
- {
215
- "bar": "apples",
216
- "baz": [
217
- 1,
218
- 2,
219
- 3,
220
- 4
221
- ],
222
- "foo": 42
223
- }
224
-
225
220
Set some data::
226
221
222
+ >>> z = zarr.Array(store)
227
223
>>> z[0:10, 0:10] = 1
228
224
>>> sorted(os.listdir('example.zarr'))
229
225
['0.0', 'attrs', 'meta']
@@ -247,3 +243,20 @@ Manually decompress a single chunk for illustration::
247
243
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
248
244
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
249
245
1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
246
+
247
+ Modify the array attributes::
248
+
249
+ >>> z.attrs['foo'] = 42
250
+ >>> z.attrs['bar'] = 'apples'
251
+ >>> z.attrs['baz'] = [1, 2, 3, 4]
252
+ >>> print(open('example.zarr/attrs').read())
253
+ {
254
+ "bar": "apples",
255
+ "baz": [
256
+ 1,
257
+ 2,
258
+ 3,
259
+ 4
260
+ ],
261
+ "foo": 42
262
+ }
0 commit comments