Skip to content

Commit a99ced2

Browse files
committed
hello world
0 parents  commit a99ced2

37 files changed

+6879
-0
lines changed

.flake8

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[flake8]
2+
max-line-length = 120
3+
exclude =
4+
.pyi
5+
typings

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
__pycache__
2+
*.pyc
3+
4+
dask-worker-space
5+
.ipynb_checkpoints
6+
.vscode
7+
/dist
8+
9+
*.DS_Store

.readthedocs.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
version: 2
2+
3+
sphinx:
4+
configuration: docs/conf.py
5+
6+
# https://github.com/readthedocs/readthedocs.org/issues/4912#issuecomment-664002569
7+
python:
8+
version: 3.8
9+
install:
10+
- method: pip
11+
path: .
12+
extra_requirements:
13+
- docs

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) [year] [fullname]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# StackSTAC
2+
3+
[![Documentation Status](https://readthedocs.org/projects/stackstac/badge/?version=latest)](https://stackstac.readthedocs.io/en/latest/?badge=latest)
4+
5+
Turn a list of [STAC](http://stacspec.org) items into a 4D [xarray](http://xarray.pydata.org/en/stable/) DataArray (dims: `time, band, y, x`), including reprojection to a common grid. The array is a lazy [Dask array](https://docs.dask.org/en/latest/array.html), so loading and processing the data in parallel—locally or [on a cluster](https://coiled.io/)—is just a `compute()` call away.
6+
7+
For more information and examples, please [see the documentation](https://stackstac.readthedocs.io).
8+
9+
```python
10+
import stackstac
11+
import satsearch
12+
13+
stac_items = satsearch.Search(
14+
url="https://earth-search.aws.element84.com/v0",
15+
intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
16+
collections=["sentinel-s2-l2a-cogs"],
17+
datetime="2020-04-01/2020-05-01"
18+
).items()
19+
20+
stack = stackstac.stack(stac_items)
21+
print(stack)
22+
```
23+
```
24+
<xarray.DataArray 'stackstac-f350f6bfc3213d7eee2e6cb159246d88' (time: 13, band: 17, y: 10980, x: 10980)>
25+
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
26+
Coordinates: (12/23)
27+
* time (time) datetime64[ns] 2020-04-01T18:04:04 ......
28+
id (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
29+
* band (band) <U8 'overview' 'visual' ... 'WVP' 'SCL'
30+
* x (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
31+
* y (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
32+
eo:cloud_cover (time) float64 29.24 1.16 27.26 ... 87.33 5.41
33+
... ...
34+
data_coverage (time) object 33.85 100 33.9 ... 32.84 100 34.29
35+
platform (time) <U11 'sentinel-2b' ... 'sentinel-2b'
36+
sentinel:sequence <U1 '0'
37+
proj:epsg int64 32613
38+
sentinel:data_coverage (time) float64 33.85 100.0 33.9 ... 100.0 34.29
39+
title (band) object None ... 'Scene Classification ...
40+
Attributes:
41+
spec: RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
42+
crs: epsg:32613
43+
transform: | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
44+
resolution: 10.0
45+
```
46+
47+
Once in xarray form, many operations become easy. For example, we can compute a low-cloud weekly mean-NDVI timeseries:
48+
49+
```python
50+
lowcloud = stack[stack["eo:cloud_cover"] < 40]
51+
nir, red = lowcloud.sel(band="B08"), lowcloud.sel(band="B04")
52+
ndvi = (nir - red) / (nir + red)
53+
weekly_ndvi = ndvi.resample(time="1w").mean(dim=("time", "x", "y")).rename("NDVI")
54+
# Call `weekly_ndvi.compute()` to process ~25GiB of raster data in parallel. Might want a dask cluster for that!
55+
```
56+
57+
## Installation
58+
59+
```
60+
pip install stackstac
61+
```
62+
63+
## Things `stackstac` does for you:
64+
65+
* Figure out the geospatial parameters from the STAC metadata (if possible): a coordinate reference system, resolution, and bounding box.
66+
* Transfer the STAC metadata into [xarray coordinates](http://xarray.pydata.org/en/stable/data-structures.html#coordinates) for easy indexing, filtering, and provenance of metadata.
67+
* Efficiently generate a Dask graph for loading the data in parallel.
68+
* Mediate between Dask's parallelism and GDAL's aversion to it, allowing for fast, multi-threaded reads when possible, and at least preventing segfaults when not.
69+
* Mask nodata and rescale by dataset-level scales/offsets.
70+
71+
## Limitations:
72+
73+
* **Raster data only!** We are currently ignoring other types of assets you might find in a STAC (XML/JSON metadata, point clouds, video, etc.).
74+
* **Single-band raster data only!** Each band has to be a separate STAC asset—a separate `red`, `green`, and `blue` asset on each Item is great, but a single `RGB` asset containing a 3-band GeoTIFF is not supported yet.
75+
* [COG](https://www.cogeo.org)s work best. "Normal" GeoTIFFs that aren't internally tiled, or don't have overviews, will see much worse performance. Sidecar files (like `.msk` files) are ignored for performace. JPEG2000 will probably work, but probably be slow unless you buy kakadu. [Formats make a big difference](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f).
76+
* BYOBlocksize. STAC doesn't offer any metadata about the internal tiling scheme of the data. Knowing it can make IO more efficient, but actually reading the data to figure it out is slow. So it's on you to set this parameter. (But if you don't, things should be fine for any reasonable COG.)
77+
* Doesn't make geospatial data any easier to work with in xarray. Common operations (picking bands, clipping to bounds, etc.) are tedious to type out. Real geospatial operations (shapestats on a GeoDataFrame, reprojection, etc.) aren't supported at all. [rioxarray](https://corteva.github.io/rioxarray/stable/readme.html) might help with some of these, but it has limited support for Dask, so be careful you don't kick off a huge computation accidentally.
78+
* I haven't even written tests yet! Don't use this in production. Or do, I guess. Up to you.
79+
80+
## Roadmap:
81+
82+
Short-term:
83+
84+
- Write tests and add CI (including typechecking)
85+
- Support multi-band assets
86+
- Easier access to `s3://`-style URIs (right now, you'll need to pass in `gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(always=dict(session=rio.session.AWSSession(...)))`)
87+
- Utility to guess blocksize (open a few assets)
88+
- Support [item assets](https://github.com/radiantearth/stac-spec/tree/master/extensions/item-assets) to provide more useful metadata with collections that use it (like S2 on AWS)
89+
- Rewrite dask graph generation once the [Blockwise IO API](https://github.com/dask/dask/pull/7281) settles
90+
91+
Long term (if anyone uses this thing):
92+
- Support other readers ([aiocogeo](https://github.com/geospatial-jeff/aiocogeo)?) that may perform better than GDAL for specific formats
93+
- Interactive mapping with [xarray_leaflet](https://github.com/davidbrochart/xarray_leaflet), made performant with some Dask graph-rewriting tricks to do the initial IO at coarser resolution for lower zoom levels (otherwize zooming out could process terabytes of data)
94+
- Improve ergonomics of xarray for raster data (in collaboration with [rioxarray](https://corteva.github.io/rioxarray/stable/readme.html))
95+
- Implement core geospatial routines (warp, vectorize, vector stats, [GeoPandas](https://geopandas.org)/[spatialpandas](https://github.com/holoviz/spatialpandas) interop) in Dask

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
_build/*
2+
api/internal/*.rst

docs/Makefile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
21+
22+
livehtml:
23+
sphinx-autobuild "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/api/internal.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Internal API reference
2+
----------------------
3+
4+
You shouldn't need to look here unless you're curious, but while this library is still so not-production-ready, you might need to understand some of these internal types for now.
5+
6+
.. currentmodule:: stackstac
7+
8+
.. autosummary::
9+
:toctree: internal
10+
11+
rio_env.LayeredEnv
12+
raster_spec.RasterSpec
13+
reader_protocol.Reader
14+
rio_reader.AutoParallelRioReader
15+
rio_reader.ThreadLocalRioDataset
16+
rio_reader.SingleThreadedRioDataset
17+
reader_protocol.FakeReader

docs/api/main.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
API reference
2+
-------------
3+
4+
There's only one function:
5+
6+
.. autofunction:: stackstac.stack

docs/basic.ipynb

Lines changed: 347 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)