Add dtype keyword to to_device #647

asmeurer · 2023-07-06T18:48:23Z

This is somewhat tangentially related to the discussions at #645. Right now to_device() doesn't have a dtype paramter, but it could be useful for it to have one. The torch.to function does have one. I'm not completely clear about cupy so @leofang would have to comment.

One reason is that certain devices might not support the existing array dtype. We also need to specify what happens in this case (likely should be an error).

This code in scikit-learn is also an example of where this would be useful https://github.com/scikit-learn/scikit-learn/pull/26315/files/42524bd42900d8ea5f4a334780387a72c6f9580d#diff-86c94a3ca33490c6190f488f5d40b01bf0fd29be36da0b4497ef0da1fda4148a. That code simultaneously converts an array to float32. It would presumably be more efficient to do this in one step instead of two.

leofang · 2023-07-06T19:07:58Z

My 2c. CuPy has ndarray.get() and asnumpy(), and currently none of them accepts a dtype argument. I think it'd be a bit confusing: to_device() should be a straightforward copy. Can't users just use asarray() if they also want to change the dtype along the way? Moreover, if a device does not support the source dtype, it can still raise when to_device() is called.

asmeurer · 2023-07-06T20:59:00Z

One of the things to come out of the astype discussions is that asarray isn't guaranteed to perform dtype casts that aren't permitted by the type promotion rules https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html

Other than that, I do agree that it seems that asarray can function as both to_device and astype.

leofang · 2023-07-07T13:12:50Z

to_device() shouldn't do illegal type casts either, right? 🙂 So, yes, I am fine with to_device() being a strict subset of asarray(). Most importantly, IMHO, to_device() should not be used as an entry point of the array library.

asmeurer · 2023-07-07T21:25:57Z

to_device() shouldn't do illegal type casts either, right?

I guess it depends on your perspective. PyTorch's to is both to_device and astype in one function, so it does do type casts.

Most importantly, IMHO, to_device() should not be used as an entry point of the array library.

Yes, I agree that the input to to_device should only be an xp array.

The above scikit-learn code basically does

def func(x):
	xp = array_namespace(x)
	y = np.random(...) # because random is not part of the array API
	y = xp.asarray(y) # convert y to an xp array
    if x.dtype == xp.float32:
		y = xp.astype(y, xp.float32) # np.random only generates float64 arrays
	y = xp.to_device(x.device)
	...

Maybe it's just me, but it feels like more function calls than should be necessary to do this. You can't combine the astype into asarray because asarray(float64_array, dtype=float32) isn't guaranteed to work (by the note here). The astype call does an unnecessary copy of y on host when it could in principle be downcast during the device transfer (but maybe this isn't a big deal?).

It feels like you'd like to be able to just do y = asarray(y, device=x.device, dtype=x.dtype), but that won't work for the same reason.

leofang · 2023-07-08T03:56:41Z

I guess it depends on your perspective. PyTorch's to is both to_device and astype in one function, so it does do type casts.

I understand, what I am saying is I'd think nowhere in the API sets should we allow illegal type casts, be it to_device() or asarray() or others, otherwise we're opening loopholes for people to escape from the type casting rules. If we want an escape hatch (I think we already have one -- non-standardized type casts are by definition implementation defined and so not portable), we can discuss it, but preferably under the dtype utility functions.

because asarray(float64_array, dtype=float32) isn't guaranteed to work

Why? We allow downcast along the type ladder, right? Or are you referring to device-specific type support?

Maybe it's just me, but it feels like more function calls than should be necessary to do this.

I think the biggest usability problem is on NumPy. Both CuPy and PyTorch allow generating random numbers with the specified dtype, but NumPy does not. It's been very annoying to me too, but unfortunately since we are not standardizing RNGs, we can't force NumPy to change via the array API standard 😢 (FYI, for generating random complex arrays all libraries are bad.) This is the root cause for us to considering escape hatches and API combos like asarray().astype().to_device(), which could have been avoided if NumPy is user friendlier IMHO.

asmeurer · 2023-07-08T07:33:40Z

My interpretation of

If dtype is not None, then array conversions should obey Type Promotion Rules rules. Conversions not specified according to Type Promotion Rules rules may or may not be permitted by a conforming array library. To perform an explicit cast, use array_api.astype().

is that downcasts are not allowed. Because a down cast is not a type promotion.

leofang · 2023-07-08T07:44:26Z

Aaron, you're right. Downcast would be problematic for values outside the representable range. I lost my mind today... OK so no downcast is allowed in principle. Are we considering exceptions?

ogrisel · 2023-07-10T12:59:35Z

Alternatively, astype could accept an optional device parameter. This would make it explicit that we want the astype semantics w.r.t. dtype conversion irrespective of Type Promotion Rules.

rgommers · 2023-07-13T13:27:02Z

I think I kinda like that astype is the only function which is allowed to downcast - hence I like @ogrisel's suggestion to give it a device keyword.

rgommers · 2023-07-28T13:13:28Z

gh-665 implements the device keyword for astype, please comment if you have other preferences

asmeurer mentioned this issue Jul 10, 2023

ENH Array API support for PCA scikit-learn/scikit-learn#26315

Merged

7 tasks

kgryte added this to the v2023 milestone Jul 13, 2023

steff456 mentioned this issue Jul 26, 2023

add device kwarg to astype API #665

Merged

1 task

kgryte closed this as completed in #665 Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dtype keyword to to_device #647

Add dtype keyword to to_device #647

asmeurer commented Jul 6, 2023

leofang commented Jul 6, 2023

asmeurer commented Jul 6, 2023

leofang commented Jul 7, 2023 •

edited

Loading

asmeurer commented Jul 7, 2023

leofang commented Jul 8, 2023

asmeurer commented Jul 8, 2023

leofang commented Jul 8, 2023

ogrisel commented Jul 10, 2023

rgommers commented Jul 13, 2023

rgommers commented Jul 28, 2023

Add dtype keyword to to_device #647

Add dtype keyword to to_device #647

Comments

asmeurer commented Jul 6, 2023

leofang commented Jul 6, 2023

asmeurer commented Jul 6, 2023

leofang commented Jul 7, 2023 • edited Loading

asmeurer commented Jul 7, 2023

leofang commented Jul 8, 2023

asmeurer commented Jul 8, 2023

leofang commented Jul 8, 2023

ogrisel commented Jul 10, 2023

rgommers commented Jul 13, 2023

rgommers commented Jul 28, 2023

leofang commented Jul 7, 2023 •

edited

Loading