Skip to content

Overlapping computations #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
martinfleis opened this issue Mar 12, 2021 · 0 comments
Open

Overlapping computations #40

martinfleis opened this issue Mar 12, 2021 · 0 comments

Comments

@martinfleis
Copy link
Member

martinfleis commented Mar 12, 2021

If you want to measure anything which has to do with the spatial relationship of features, you eventually find an issue of needing to go across chunk boundaries (assuming spatially chunked ddf). A typical example is a spatial lag, i.e. the mean of values on neighbouring (touching) features.

The way raster analysis deals with it is using dask.array.map_overlap. That essentially copies a bit of neighbouring data to each chunk to create overlaps so you can do your spatial lag fully within a chunk. See https://docs.dask.org/en/latest/array-overlap.html

I believe that we need the analogy of map_overlap for vector data. It is naturally a significantly more complex issue since we do not know how the data look like (it is not a grid). But I believe it is doable and could be a massive game-changer. For example, I would be able to base 90% of momepy on dask-geopandas.

The trick is to define which features should be overlapping. For that, you need to know how far you have to go for each particular operation but we can specify:

  • a distance threshold (everything within n meters from the chunk boundary)
  • topological threshold (everything within n steps of contiguity)

I have actually already tested this approach with topological threshold, with custom single-core functions and it works well.

We obviously first need spatial re-chunking and spatial indexing, but this is something I'd like to put on a roadmap (maybe for GSoC?).

dask.array implementation - https://docs.dask.org/en/latest/array-overlap.html?highlight=map_overlap#dask.array.map_overlap
dask.dataframe implementation - https://docs.dask.org/en/latest/dataframe-api.html?highlight=map_overlap#dask.dataframe.DataFrame.map_overlap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant