Skip to content

Commit 1b3e20a

Browse files
committed
Merge pull request #22 from alimanfoo/refactor
Refactoring for v1.0. Resolves #27, #25, #21, #7.
2 parents 08fdcd8 + 5adbe30 commit 1b3e20a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+15605
-14023
lines changed

.gitignore

+2-4
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,10 @@ target/
5959
# PyCharm
6060
.idea
6161

62-
# don't include cython generated C code (for now)
63-
zarr/*.c
64-
6562
# setuptools-scm
6663
zarr/version.py
6764

6865
# test data
6966
*.zarr
70-
*~
67+
*~
68+
*.zip

.travis.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ python:
66
- 3.5
77

88
install:
9+
- pip install -U tox-travis
910
- pip install -U pip setuptools setuptools_scm wheel
1011
- pip install -U nose -rrequirements.txt
1112
- python setup.py build_ext --inplace
1213

1314
script:
14-
- nosetests -v
15+
- tox

MANIFEST.in

+3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
recursive-include c-blosc *
2+
recursive-include zarr *.pxd
23
recursive-include zarr *.pyx
4+
recursive-include zarr *.h
5+
recursive-include zarr *.c
36
include cpuinfo.py

PERSISTENCE.rst

-124
This file was deleted.

README.rst

+5-184
Original file line numberDiff line numberDiff line change
@@ -1,189 +1,10 @@
1-
zarr
1+
Zarr
22
====
33

4-
A minimal implementation of chunked, compressed, N-dimensional arrays
5-
for Python.
6-
7-
* Source code: https://github.com/alimanfoo/zarr
8-
* Download: https://pypi.python.org/pypi/zarr
9-
* Release notes: https://github.com/alimanfoo/zarr/releases
4+
Zarr is a Python package providing an implementation of compressed,
5+
chunked, N-dimensional arrays, designed for use in parallel
6+
computing. See the `documentation <http://zarr.readthedocs.io/>`_ for
7+
more information.
108

119
.. image:: https://travis-ci.org/alimanfoo/zarr.svg?branch=master
1210
:target: https://travis-ci.org/alimanfoo/zarr
13-
14-
Installation
15-
------------
16-
17-
Installation requires Numpy and Cython pre-installed. Can only be
18-
installed on Linux currently.
19-
20-
Install from PyPI::
21-
22-
$ pip install -U zarr
23-
24-
Install from GitHub::
25-
26-
$ pip install -U git+https://github.com/alimanfoo/zarr.git@master
27-
28-
Status
29-
------
30-
31-
Experimental, proof-of-concept. This is alpha-quality software. Things
32-
may break, change or disappear without warning.
33-
34-
Bug reports and suggestions welcome.
35-
36-
Design goals
37-
------------
38-
39-
* Chunking in multiple dimensions
40-
* Resize any dimension
41-
* Concurrent reads
42-
* Concurrent writes
43-
* Release the GIL during compression and decompression
44-
45-
Usage
46-
-----
47-
48-
Create an array:
49-
50-
.. code-block::
51-
52-
>>> import numpy as np
53-
>>> import zarr
54-
>>> z = zarr.empty(shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
55-
>>> z
56-
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
57-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
58-
nbytes: 38.1M; cbytes: 0; initialized: 0/100
59-
60-
Fill it with some data:
61-
62-
.. code-block::
63-
64-
>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
65-
>>> z
66-
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
67-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
68-
nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100
69-
70-
Obtain a NumPy array by slicing:
71-
72-
.. code-block::
73-
74-
>>> z[:]
75-
array([[ 0, 1, 2, ..., 997, 998, 999],
76-
[ 1000, 1001, 1002, ..., 1997, 1998, 1999],
77-
[ 2000, 2001, 2002, ..., 2997, 2998, 2999],
78-
...,
79-
[9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999],
80-
[9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999],
81-
[9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32)
82-
>>> z[:100]
83-
array([[ 0, 1, 2, ..., 997, 998, 999],
84-
[ 1000, 1001, 1002, ..., 1997, 1998, 1999],
85-
[ 2000, 2001, 2002, ..., 2997, 2998, 2999],
86-
...,
87-
[97000, 97001, 97002, ..., 97997, 97998, 97999],
88-
[98000, 98001, 98002, ..., 98997, 98998, 98999],
89-
[99000, 99001, 99002, ..., 99997, 99998, 99999]], dtype=int32)
90-
>>> z[:, :100]
91-
array([[ 0, 1, 2, ..., 97, 98, 99],
92-
[ 1000, 1001, 1002, ..., 1097, 1098, 1099],
93-
[ 2000, 2001, 2002, ..., 2097, 2098, 2099],
94-
...,
95-
[9997000, 9997001, 9997002, ..., 9997097, 9997098, 9997099],
96-
[9998000, 9998001, 9998002, ..., 9998097, 9998098, 9998099],
97-
[9999000, 9999001, 9999002, ..., 9999097, 9999098, 9999099]], dtype=int32)
98-
99-
Resize the array and add more data:
100-
101-
.. code-block::
102-
103-
>>> z.resize(20000, 1000)
104-
>>> z
105-
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
106-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
107-
nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5; initialized: 100/200
108-
>>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
109-
>>> z
110-
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
111-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
112-
nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3; initialized: 200/200
113-
114-
For convenience, an ``append()`` method is also available, which can be used to
115-
append data to any axis:
116-
117-
.. code-block::
118-
119-
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
120-
>>> z = zarr.array(a, chunks=(1000, 100))
121-
>>> z.append(a+a)
122-
>>> z
123-
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
124-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
125-
nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2; initialized: 200/200
126-
>>> z.append(np.vstack([a, a]), axis=1)
127-
>>> z
128-
zarr.ext.SynchronizedArray((20000, 2000), int32, chunks=(1000, 100))
129-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
130-
nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2; initialized: 400/400
131-
132-
Persistence
133-
-----------
134-
135-
Create a persistent array (data stored on disk):
136-
137-
.. code-block::
138-
139-
>>> path = 'example.zarr'
140-
>>> z = zarr.open(path, mode='w', shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
141-
>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
142-
>>> z
143-
zarr.ext.SynchronizedPersistentArray((10000, 1000), int32, chunks=(1000, 100))
144-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
145-
nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100
146-
mode: w; path: example.zarr
147-
148-
There is no need to close a persistent array. Data are automatically flushed
149-
to disk.
150-
151-
If you're working with really big arrays, try the 'lazy' option:
152-
153-
.. code-block::
154-
155-
>>> path = 'big.zarr'
156-
>>> z = zarr.open(path, mode='w', shape=(1e8, 1e7), dtype='i4', chunks=(1000, 1000), lazy=True)
157-
>>> z
158-
zarr.ext.SynchronizedLazyPersistentArray((100000000, 10000000), int32, chunks=(1000, 1000))
159-
cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
160-
nbytes: 3.6P; cbytes: 0; initialized: 0/1000000000
161-
mode: w; path: big.zarr
162-
163-
See the `persistence documentation <PERSISTENCE.rst>`_ for more
164-
details of the file format.
165-
166-
Tuning
167-
------
168-
169-
``zarr`` is optimised for accessing and storing data in contiguous
170-
slices, of the same size or larger than chunks. It is not and probably
171-
never will be optimised for single item access.
172-
173-
Chunks sizes >= 1M are generally good. Optimal chunk shape will depend
174-
on the correlation structure in your data.
175-
176-
``zarr`` is designed for use in parallel computations working
177-
chunk-wise over data. Try it with `dask.array
178-
<http://dask.pydata.org/en/latest/array.html>`_. If using in a
179-
multi-threaded, set zarr to use blosc in contextual mode::
180-
181-
>>> zarr.set_blosc_options(use_context=True)
182-
183-
Acknowledgments
184-
---------------
185-
186-
``zarr`` uses `c-blosc <https://github.com/Blosc/c-blosc>`_ internally for
187-
compression and decompression and borrows code heavily from
188-
`bcolz <http://bcolz.blosc.org/>`_.
189-

0 commit comments

Comments
 (0)