|
1 |
| -zarr |
| 1 | +Zarr |
2 | 2 | ====
|
3 | 3 |
|
4 |
| -A minimal implementation of chunked, compressed, N-dimensional arrays |
5 |
| -for Python. |
6 |
| - |
7 |
| -* Source code: https://github.com/alimanfoo/zarr |
8 |
| -* Download: https://pypi.python.org/pypi/zarr |
9 |
| -* Release notes: https://github.com/alimanfoo/zarr/releases |
| 4 | +Zarr is a Python package providing an implementation of compressed, |
| 5 | +chunked, N-dimensional arrays, designed for use in parallel |
| 6 | +computing. See the `documentation <http://zarr.readthedocs.io/>`_ for |
| 7 | +more information. |
10 | 8 |
|
11 | 9 | .. image:: https://travis-ci.org/alimanfoo/zarr.svg?branch=master
|
12 | 10 | :target: https://travis-ci.org/alimanfoo/zarr
|
13 |
| - |
14 |
| -Installation |
15 |
| ------------- |
16 |
| - |
17 |
| -Installation requires Numpy and Cython pre-installed. Can only be |
18 |
| -installed on Linux currently. |
19 |
| - |
20 |
| -Install from PyPI:: |
21 |
| - |
22 |
| - $ pip install -U zarr |
23 |
| - |
24 |
| -Install from GitHub:: |
25 |
| - |
26 |
| - $ pip install -U git+https://github.com/alimanfoo/zarr.git@master |
27 |
| - |
28 |
| -Status |
29 |
| ------- |
30 |
| - |
31 |
| -Experimental, proof-of-concept. This is alpha-quality software. Things |
32 |
| -may break, change or disappear without warning. |
33 |
| - |
34 |
| -Bug reports and suggestions welcome. |
35 |
| - |
36 |
| -Design goals |
37 |
| ------------- |
38 |
| - |
39 |
| -* Chunking in multiple dimensions |
40 |
| -* Resize any dimension |
41 |
| -* Concurrent reads |
42 |
| -* Concurrent writes |
43 |
| -* Release the GIL during compression and decompression |
44 |
| - |
45 |
| -Usage |
46 |
| ------ |
47 |
| - |
48 |
| -Create an array: |
49 |
| - |
50 |
| -.. code-block:: |
51 |
| - |
52 |
| - >>> import numpy as np |
53 |
| - >>> import zarr |
54 |
| - >>> z = zarr.empty(shape=(10000, 1000), dtype='i4', chunks=(1000, 100)) |
55 |
| - >>> z |
56 |
| - zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100)) |
57 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
58 |
| - nbytes: 38.1M; cbytes: 0; initialized: 0/100 |
59 |
| -
|
60 |
| -Fill it with some data: |
61 |
| - |
62 |
| -.. code-block:: |
63 |
| - |
64 |
| - >>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
65 |
| - >>> z |
66 |
| - zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100)) |
67 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
68 |
| - nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100 |
69 |
| -
|
70 |
| -Obtain a NumPy array by slicing: |
71 |
| - |
72 |
| -.. code-block:: |
73 |
| - |
74 |
| - >>> z[:] |
75 |
| - array([[ 0, 1, 2, ..., 997, 998, 999], |
76 |
| - [ 1000, 1001, 1002, ..., 1997, 1998, 1999], |
77 |
| - [ 2000, 2001, 2002, ..., 2997, 2998, 2999], |
78 |
| - ..., |
79 |
| - [9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999], |
80 |
| - [9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999], |
81 |
| - [9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32) |
82 |
| - >>> z[:100] |
83 |
| - array([[ 0, 1, 2, ..., 997, 998, 999], |
84 |
| - [ 1000, 1001, 1002, ..., 1997, 1998, 1999], |
85 |
| - [ 2000, 2001, 2002, ..., 2997, 2998, 2999], |
86 |
| - ..., |
87 |
| - [97000, 97001, 97002, ..., 97997, 97998, 97999], |
88 |
| - [98000, 98001, 98002, ..., 98997, 98998, 98999], |
89 |
| - [99000, 99001, 99002, ..., 99997, 99998, 99999]], dtype=int32) |
90 |
| - >>> z[:, :100] |
91 |
| - array([[ 0, 1, 2, ..., 97, 98, 99], |
92 |
| - [ 1000, 1001, 1002, ..., 1097, 1098, 1099], |
93 |
| - [ 2000, 2001, 2002, ..., 2097, 2098, 2099], |
94 |
| - ..., |
95 |
| - [9997000, 9997001, 9997002, ..., 9997097, 9997098, 9997099], |
96 |
| - [9998000, 9998001, 9998002, ..., 9998097, 9998098, 9998099], |
97 |
| - [9999000, 9999001, 9999002, ..., 9999097, 9999098, 9999099]], dtype=int32) |
98 |
| -
|
99 |
| -Resize the array and add more data: |
100 |
| - |
101 |
| -.. code-block:: |
102 |
| - |
103 |
| - >>> z.resize(20000, 1000) |
104 |
| - >>> z |
105 |
| - zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
106 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
107 |
| - nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5; initialized: 100/200 |
108 |
| - >>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
109 |
| - >>> z |
110 |
| - zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
111 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
112 |
| - nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3; initialized: 200/200 |
113 |
| -
|
114 |
| -For convenience, an ``append()`` method is also available, which can be used to |
115 |
| -append data to any axis: |
116 |
| - |
117 |
| -.. code-block:: |
118 |
| - |
119 |
| - >>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
120 |
| - >>> z = zarr.array(a, chunks=(1000, 100)) |
121 |
| - >>> z.append(a+a) |
122 |
| - >>> z |
123 |
| - zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100)) |
124 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
125 |
| - nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2; initialized: 200/200 |
126 |
| - >>> z.append(np.vstack([a, a]), axis=1) |
127 |
| - >>> z |
128 |
| - zarr.ext.SynchronizedArray((20000, 2000), int32, chunks=(1000, 100)) |
129 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
130 |
| - nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2; initialized: 400/400 |
131 |
| -
|
132 |
| -Persistence |
133 |
| ------------ |
134 |
| - |
135 |
| -Create a persistent array (data stored on disk): |
136 |
| - |
137 |
| -.. code-block:: |
138 |
| -
|
139 |
| - >>> path = 'example.zarr' |
140 |
| - >>> z = zarr.open(path, mode='w', shape=(10000, 1000), dtype='i4', chunks=(1000, 100)) |
141 |
| - >>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000) |
142 |
| - >>> z |
143 |
| - zarr.ext.SynchronizedPersistentArray((10000, 1000), int32, chunks=(1000, 100)) |
144 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
145 |
| - nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100 |
146 |
| - mode: w; path: example.zarr |
147 |
| -
|
148 |
| -There is no need to close a persistent array. Data are automatically flushed |
149 |
| -to disk. |
150 |
| - |
151 |
| -If you're working with really big arrays, try the 'lazy' option: |
152 |
| - |
153 |
| -.. code-block:: |
154 |
| -
|
155 |
| - >>> path = 'big.zarr' |
156 |
| - >>> z = zarr.open(path, mode='w', shape=(1e8, 1e7), dtype='i4', chunks=(1000, 1000), lazy=True) |
157 |
| - >>> z |
158 |
| - zarr.ext.SynchronizedLazyPersistentArray((100000000, 10000000), int32, chunks=(1000, 1000)) |
159 |
| - cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE) |
160 |
| - nbytes: 3.6P; cbytes: 0; initialized: 0/1000000000 |
161 |
| - mode: w; path: big.zarr |
162 |
| -
|
163 |
| -See the `persistence documentation <PERSISTENCE.rst>`_ for more |
164 |
| -details of the file format. |
165 |
| - |
166 |
| -Tuning |
167 |
| ------- |
168 |
| - |
169 |
| -``zarr`` is optimised for accessing and storing data in contiguous |
170 |
| -slices, of the same size or larger than chunks. It is not and probably |
171 |
| -never will be optimised for single item access. |
172 |
| - |
173 |
| -Chunks sizes >= 1M are generally good. Optimal chunk shape will depend |
174 |
| -on the correlation structure in your data. |
175 |
| - |
176 |
| -``zarr`` is designed for use in parallel computations working |
177 |
| -chunk-wise over data. Try it with `dask.array |
178 |
| -<http://dask.pydata.org/en/latest/array.html>`_. If using in a |
179 |
| -multi-threaded, set zarr to use blosc in contextual mode:: |
180 |
| - |
181 |
| - >>> zarr.set_blosc_options(use_context=True) |
182 |
| - |
183 |
| -Acknowledgments |
184 |
| ---------------- |
185 |
| - |
186 |
| -``zarr`` uses `c-blosc <https://github.com/Blosc/c-blosc>`_ internally for |
187 |
| -compression and decompression and borrows code heavily from |
188 |
| -`bcolz <http://bcolz.blosc.org/>`_. |
189 |
| - |
0 commit comments