Overlapping computations #40

martinfleis · 2021-03-12T09:32:58Z

If you want to measure anything which has to do with the spatial relationship of features, you eventually find an issue of needing to go across chunk boundaries (assuming spatially chunked ddf). A typical example is a spatial lag, i.e. the mean of values on neighbouring (touching) features.

The way raster analysis deals with it is using dask.array.map_overlap. That essentially copies a bit of neighbouring data to each chunk to create overlaps so you can do your spatial lag fully within a chunk. See https://docs.dask.org/en/latest/array-overlap.html

I believe that we need the analogy of map_overlap for vector data. It is naturally a significantly more complex issue since we do not know how the data look like (it is not a grid). But I believe it is doable and could be a massive game-changer. For example, I would be able to base 90% of momepy on dask-geopandas.

The trick is to define which features should be overlapping. For that, you need to know how far you have to go for each particular operation but we can specify:

a distance threshold (everything within n meters from the chunk boundary)
topological threshold (everything within n steps of contiguity)

I have actually already tested this approach with topological threshold, with custom single-core functions and it works well.

We obviously first need spatial re-chunking and spatial indexing, but this is something I'd like to put on a roadmap (maybe for GSoC?).

dask.array implementation - https://docs.dask.org/en/latest/array-overlap.html?highlight=map_overlap#dask.array.map_overlap
dask.dataframe implementation - https://docs.dask.org/en/latest/dataframe-api.html?highlight=map_overlap#dask.dataframe.DataFrame.map_overlap

The text was updated successfully, but these errors were encountered:

jorisvandenbossche mentioned this issue Apr 30, 2021

Dask Summit 2021 - "Scaling geospatial vector data" workshop geopandas/community#4

Open

caspervdw mentioned this issue May 29, 2021

ENH: spatial partitioning of the GeoDataFrame #8

Open

knaaptime mentioned this issue Aug 11, 2021

Support Geopandas modin-project/modin#671

Open

martinfleis mentioned this issue Aug 27, 2022

[ENH] Overlay functionality with Dask-geopandas #217

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlapping computations #40

Overlapping computations #40

martinfleis commented Mar 12, 2021 •

edited

Loading

Overlapping computations #40

Overlapping computations #40

Comments

martinfleis commented Mar 12, 2021 • edited Loading

martinfleis commented Mar 12, 2021 •

edited

Loading