-
Notifications
You must be signed in to change notification settings - Fork 35
Merge output variables with input dataset #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b8d74a
to
fdd7b62
Compare
Thanks @tomwhite, that squares well with what I took away from the issue discussion. One problem here though is that I can see this being a frustrating experience for users: import xarray as xr
import dask.array as da
# I load a dataset from somewhere
ds = xr.Dataset(dict(x=xr.DataArray(da.random.random(100))))
# This is what sgkit functions do (adds a `y` variable in this case)
def fn(ds):
new_ds = xr.Dataset(dict(y=xr.DataArray(da.random.random(100))))
return ds.merge(new_ds)
# First run
ds = fn(ds) # No eager evaluation happens here
# Second run
ds = fn(ds) # Because `y` already exists, Xarray will force a compute and compare the values
> MergeError: conflicting values for variable 'y' on objects to be combined. You can skip this check by specifying compat='override'. I think we should make every function either do |
LGTM also. I think there'll be refinements we need to make as we get more experience, but this is the right basic "shape" of how things are done. As such, I'd say we merge ASAP (modulo addressing @eric-czech's points) and start building on it. |
Thanks @eric-czech, that's a useful case to consider. I have extracted a |
@jeromekelleher I agree we want to get the general API approach established sooner rather than later. This change will impact #100 and #102 for example. |
Perfect, thank you. |
This is an initial attempt to implement #103 for count allele functions. Does this look like the right direction?